4> 



GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched : 



February 7, 2002, 10:55:05 ; Search time 3842.15 Seconds 

(without alignments) 
1906.419 Million cell updates/sec 

US-09-394-745-6154 
444 

1 cgaaaacactggtacccaaa tcccattttaagaaataaat 444 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 

1472140 seqs, 8248589755 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



2944280 



Database 



GenEmbl : * 



1 




gb_ba : * 




2 




gb_htg: 


* 


3 




gb in:* 




4 




gb om : * 




5 




gb ov : * 




6 




gb_pat : 


* 


7 




gb ph : * 




8 




gb pi : * 




9 




gb pr : * 




10 


gb ro : 


* 


11 


gb sts 


. * 


12 


gb sy : 


* 


13 


gb un : 


* 


14 


gb_vi : 


* 


15 


em ba: 




1( 




em fun 


: * 


17 


em_hum 


. * 


18 


em_in : 


* 


19 


em_om : 




20 


em_or : 




21 


em_ov : 




22 


em_pat 


: * 


23 


; em_ph : 


★ 


24 


: em pi : 


★ 


25 


: em ro: 




26 


: em sts 


: * 


27 


: em sy: 





o o 

28 


em 


un : * 


2 9 


em 


vi : * 


o u 


em 


n ego nuiu . 


31 


em_ 


htgo_inv : * 


32 


em_ 


_htgo_rod: * 


33 


em_ 


_htg_hum: * 


34 


em 


htg inv:* 


35 


em 


_htg_rod: * 


36 


em 


htg other: * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No. 


Score 


Match 


Length 


DB 


ID 


Description 




1 


353 . 4 


79 


6 


481 


8 


ZMA133529 


AJ133529 Zea mays 




2 


284 


64 


0 


379 


6 


AX015683 


AX015683 Sequence 




3 


. 58.8 


13 


2 


592 


8 


ZMA297902 


AJ297902 Zea mays 




4 


42.4 


9 


5 


579 


8 


ZMA297903 


AJ297903 Zea mays 


c 


5 


41.4 


9 


3 


7218 


6 


166494 


166494 Sequence 14 


c 


6 


39 . 4 


8 


9 


41334 


3 


CELC30B5 


U23450 Caenorhabdi 




7 


37 . 8 


8 


5 


188254 


2 


AC080021 


AC080021 Mus muscu 




8 


36 . 8 


8 


3 


170225 


2 


AC020602 


AC020602 Homo sapi 




9 


36 


8 


1 


49307 


3 


CELY34D9A 


AC024756 Caenorhab 


c 


10 


36 


8 


1 


166214 


2 


AC006735 


AC006735 Caenorhab 




11 


36 


8 


1 


183980 


2 


AC011081 


AC011081 Homo sapi 


c 


12 


35. 6 


8 


0 


63325 


9 


AL353592 


AL353592 Human DNA 




13 


35.4 


8 


0 


563 


8 


ZMA297901 


AJ297901 Zea mays 




14. 


35.4 


8 


0 


78874 


2 


AL355521 


AL355521 Homo sapi 




15 


35.4 


8 


0 


174986 


2 


AC064821 


AC064821 Homo sapi 




16 


35.4 


8 


0 


175352 


2 


AC092491 


AC092491 Homo sapi 




17 


35.2 


7 


9 


1680 


8 


AB008680 


AB008680 Glycine m 




18 


35.2 


7 


9 


3636 


8 


SOYBPSP 


M13759 Glycine max 


c 


19 


35 


7 


9 


105383 


2 


AC010057 


AC010057 Drosophil 




20 


35 


.1 


9 


145087 


2 


AC019753 


AC019753 Drosophil 


c 


21 


35 


1 


9 


159108 


2 


AC026031 


AC026031 Homo sapi 


c 


22 


35 


1 


9 


168047 


3 


AC091219 


AC091219 Drosophil 


c 


23 


35 


1 


9 


302527 


3 


AE003469 


AE003469 Drosophil 


c 


24 


34.8 


1 


8 


112203 


9 


HSJ519P24 


AL0504 01 Human DNA 




25 


34.6 


1 


8 


1141 


6 


AX083744 


AX083744 Sequence 




26 


34.6 


1 


8 


161799 


9 


AC002091 


AC002091 Genomic s 


c 


27 


34.6 


1 


8 


182341 


2 


AC073337 


AC073337 Homo sapi 


c 


28 


34 . 6 


1 


.8 


186552 


2 


AC090610 


AC090610 Homo sapi 




29 


34 . 6 


1 


.8 


191686 


9 


AL359197 


AL359197 Human DNA 




30 


34.4 


1 


.7 


89779 


8 


AB005234 


AB005234 Arabidops 


c 


31 


34 . 4 


1 


.7 


111547 


2 


AP002332 


AP002332 Homo sapi 




32 


34.4 


1 


.7 


169230 


2 


AC012211 


AC012211 Homo sapi 




33 


34.4 


1 


.7 


180041 


2 


AC009831 


AC009831 Homo sapi 




34 


34.4 


1 


.7 


189181 


2 


AP001333 


AP001333 Homo sapi 


c 


35 


34.2 


1 


.7 


1404 


6 


E03536 


E03536 DNA sequenc 


c 


36 


34.2 


1 


.7 


1404 


6 


E08057 


E08057 DNA encodin 


c 


37 


34.2 


1 


.7 


1404 


6 


E08058 


E08058 DNA encodin 


c 


38 


34 .2 


1 


.7 


1404 


6 


E08059 


E08059 DNA encodin 



c 


j y 


J ft . Z 


*7 


, / 


1 /I fi j4 
1 H U 4 


o 




dUOUDU 


uiNrt. encouin 


c 


40 


34.2 


7, 


.7 


1404 


6 


123831 


123831 


Sequence 3 


c 


41 


34 .2 


7. 


.7 


1404 


6 


123832 


123832 


Sequence 5 


c 


42 


34.2 


7, 


,7 


1404 


6 


123833 


123833 


Sequence 7 


c 


43 


34.2 


7. 


.7 


1404 


6 


143342 


143342 


Sequence 2 


c 


44 


34.2 


7, 


.7 


1404 


6 


143343 


143343 


Sequence 3 


c 


45 


34.2 


7. 


.7 


1404 


6 


143344 


143344 


Sequence 4 



ALIGNMENTS 



RESULT 1 

ZMA133529 

LOCUS 

DEFINITION 
ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



ZMA133529 481 bp mRNA 

Zea mays mRNA for BETL2 protein. 
AJ133529 

GI : 5042328 
BETL2 protein. 



PLN 



02-DEC-1999 



AJ133529.1 
betl2 gene 
Zea mays . 
Zea mays 

Eukaryota; Vir idiplantae ; Streptophyta ; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae ; Zea. 

1 (bases 1 to 481) 

Hueros,G., Royo,J., Maitz,M., Salamini,F. and Thompson, R. D . 
Evidence for factors regulating transfer cell-specific expression 
in maize endosperm 

Plant Mol. Biol. 41 (3), 403-414 (1999) 
20064976 

2 (bases 1 to 481) 
Thompson, R. D. 
Direct Submission 

Submitted (08-MAR-1999) Thompson R.D., Plant Breeding, MPI For 
Plant Breeding Research, Carl-von-Linne-weg 10, D-50829 Koeln, 
GERMANY 

Location/Qualifiers 
1. .481 
/organism=" Zea mays" 
/variety="A69Y" 
/db_xref="taxon: 4577" 
/ count ry= "Argentina" 
sig_peptide 44. .121 

/gene="betl2" 
CDS 44. .331 

/gene="betl2" 
/codon_start=l 
/product="BETL2 protein" 
/protein_id="CAB44662 . 1" 
/db_xref ="GI : 5042329" 
/db_xref ^"SPTREMBL : Q9XGE0" 

/trans la tion=" MAKCSSFQGLFWLLSMILLASFVAHARTTSGQTKEDSNARNMTM 

TKTRASGNILVSRNDDGPCYLDSGLNEYVCRKTNKCYKSLVLCVASCQPSS" 
gene 44. .331 

/gene="betl2" 
mat_peptide 119. .328 

/gene="betl2" 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



BASE COUNT 
ORIGIN 



/product="BETL2 protein" 
156 a 100 c 107 g 118 t 



Query Match 79.6%; Score 353.4; DB 8; Length 481; 

Best Local Similarity 88.7%; Pred. No. 4.2e-98; 

Matches 392; Conservative 0; Mismatches 4 9; Indels 1; Gaps 1; 

Qy 3 aaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaat 62 

II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 13 AGACTATTGTAGCTCATATCATCTGTCACCCATGGCGAAGTGCAGCAGCTTCCAAGGATT 7 2 

Qy 63 aatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaa 122 

I II II III I I I I I II II II I I II III III II III I I I I I I II 
Db 73 ATTCTGGTTGCTTTCCATGATTCTTCTAGCATCCTTTGTTGCTCATGCACG-CACAACAA 131 

Qy 123 gtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcat 182 

I I I I I I I I I I I 1 I I I I I I I M I I M I I I I I I M I I I I I I II I I I I I I I I I I I I I I I II 
Db 132 GTGGGCAAACCAAAGAGGACAGCAATGCTAGGAACATGACGATGACCAAGACGAGGGCAT 191 

Qy 183 cgggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctta 24 2 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I 
Db 192 CAGGCAACATACTTGTTAGCCGTAATGACGACGGGCCATGCTATCTAGATTCCGGTCTTA 251 

Qy 243 atgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcga 302 

I I I I I I I I II I I I I I' I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I 
Db 252 ATGAGTACGTCTGCAGAAAGACTAATAAGTGCTATAAGAGCTTGGTGCTCTGCGTGGCGA 311 

Qy 303 gttgtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagaca 362 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 312 GTTGTCAACCATCATCATGAATTCATGATACTGCGGAGACATCATGATACTGCGGAGACA 371 

Qy 363 gacggccagagatgangctagctagatgccgtttcaccannatattatgtaacacccaaa 422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 372 GACGGCGAGAGATGAGGCTAGCTAGATGCTGTTTCACCAAAATATTATGTAACACCCAAA 4 31 

Qy 423 tctcccattttaagaaataaat 444 

I I I I I I I I I I I I I -I I I I I I I I I 
Db 4 32 TCTCCCATTTTAAGAAATAAAT 4 53 



RESULT 2 

AX015683 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



07-SEP-2000 



AX015683 37 9 bp DNA PAT 

Sequence 1 from Patent WO9950427. 
AX015683 

AX015683.1 GI:10041512 



Zea mays . 
Zea mays 

Eukaryot a ; Vir idiplant ae ; St reptophyta ; Embryophyta ; Tracheophyta ; 
Spermatophyta; Magnoliophyta ; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae ; Zea. 
1 (bases 1 to 379) 

Yan,G., Salamini,F., Thompson, R. D . and Hueros,G. 

Novel basal endosperm transfer cell layer (betl) specific genes 



JOURNAL 



FEATURES 

source 



CDS 



BASE COUNT 
ORIGIN 



Patent: WO 9950427-A 1 07-OCT-1999; 

YAN GUO (DE); MAX PLACK GES ZUR FOERDERUNG D (DE) ; SALAMINI 
FRANCESCO (DE) ; THOMPSON RICHARD D (DE) ; HUEROS GREGORIO (ES) 

Location /Qualifiers 

1. .379 

/organism="Zea mays" 
/db_xref="taxon:4 57 7 ,f 
44. .331 

/note="unnamed protein product" 
/codon_start=l 
/protein_id= ,f CAC07599. 1" 
/db_xref ="GI : 10041513" 

/trans lation-"MAKCSSFQGLFWLLSMILLASFVAHARTTSGQTKEDSNARNMTM 
TKTRASGNILVSRNDDGPCYLDSGLNEYVCRKTNKCYKSLVLCVASCQPSS" 
107 a 85 c 94 g 93 t 



Query Match 64.0%; Score 284; DB 6; Length 379; 

Best Local Similarity 87.5%; Pred. No. le-76; 

Matches 322; Conservative 0; Mismatches 45; Indels 1; Gaps 1; 

Qy 3 aaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaat 62 

II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 13 AGACTATTGTAGCTCATATCATCTGTCACCCATGGCGAAGTGCAGCAGCTTCCAAGGATT 72 

Qy 63 aatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaa 122 

f II II III I I I I I II II II I I II III III II III I I I I I I II 
Db 7 3 ATTCTGGTTGCTTTCCATGATTCTTCTAGCATCCTTTGTTGCTCATGCACG-CACAACAA 131 

Qy 123 gtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcat 182 

I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 132 GTGGGCAAACCAAAGAGGACAGCAATGCTAGGAACATGACGATGACCAAGACGAGGGCAT 191 

Qy 183 cgggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctta 242 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 192 CAGGCAACATACTTGTTAGCCGTAATGACGACGGGCCATGCTATCTAGATTCCGGTCTTA 251 

Qy 24 3 atgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcga 302 

I I II I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I . ' 
Db 252 ATGAGTACGTCTGCAGAAAGACTAATAAGTGCTATAAGAGCTTGGTGCTCTGCGTGGCGA 311 

Qy 303 gttgtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagaca 362 

I I I I I I I I I I I I I I I I I I I I I I I I I 1111111111111111111111111111111111 
Db 312 GTTGTCAACCATCATCATGAATTCATGATACTGCGGAGACATCATGATACTGCGGAGACA 371 

Qy 363 gacggcca 370 

I I I I I I I 
Db 372 GACGGCGA 37 9 



RESULT 3 

ZMA297902 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 



ZMA2 97 902 592 bp mRNA PLN 

Zea mays mRNA for basal layer antifungal peptide 

AJ297902 

AJ297902.1 GI:12214248 



ll-JAN-2001 
(bap-3a gene) 



KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



sig peptide 



CDS 



gene 



mat peptide 



BASE COUNT 
ORIGIN 



bap-3a gene; basal layer antifungal peptide. 
Zea mays . 
Zea mays 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae; Zea. 

1 (bases 1 to 592) 

Serna Sanz,A. and Thompson, R. D . 

Maize endosperm secretes a novel antifungal protein into adjacent 

maternal tissue 

Unpublished 

2 (bases 1 to 592) 
Serna, A. 

Direct Submission 

Submitted ( 31-OCT-2000 ) Serna A., Plant Physiology, Max Planck 
Institut, Carl von Linne Weg 10, Cologne, 50829, GERMANY 

Location/Qualifiers 

1. .592 

/organism="Zea mays" 
/variety="A69Y" 
/db_xref="taxon: 4577" 
/tissue_type="endosperm" 
/dev_stage="7 days after pollination" 
49. .132 
/gene="bap-3a" 
49. .339 
/gene="bap-3a" 

/function="putative antifungal peptide" 
/codon_start=l 

/product="basal layer antifungal peptide" 
/protein_id-"CAC21606. 1" 
/db_xref="GI : 12214249" 

/translation="MVKILDHISIRGFFLLFMVLVASFVGHAQIIRGETKEDNDTKSM 
TMTTMRPGSYVTSMDEKSSLCFEDIKTLWYICRTTYHLYRTLKDCLSHCNSM" 
49. .339 
/gene="bap-3a" 
133. .336 
/gene="bap-3a" 
/product="bap-3a protein" 
196 a 102 c 115 g 179 t 



Query Match 13.2%; Score 58.8; DB 8; Length 592; 

Best Local Similarity 54.7%; Pred. No. 3.1e'-07; 

Matches 117; Conservative 0; Mismatches 97; Indels 0; Gaps 0; 

Qy 93 cacccttggtgcccaagcaagccacaaaa.agtgggcaaaccaaagaggacagcaatgcta 152 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 113 CATCCTTTGTTGGTCATGCACAGATAATAAGAGGTGAAACCAAAGAGGACAACGACACCA 17 2 

Qy 153 ggaaaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacg 212 

II I I I I I I I I I I I I I I I I I II III II I I II I 
Db 17 3 AGAGCATGACGATGACAACAATGAGACCAGGAAGCTATGTAACTAGCATGGATGAAAAAT 232 

Qy 213 acgggccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagt 272 
I I I I I I III I I I I I I I I I I I I I I I I I I I I 



Db 233 CTAGCTTGTGCTTTGAGGATATAAAAACTTTATGGTACATCTGCAGAACAACTTATCACC 2 92 

Qy 273 gctataagagcttggtgctctgcgtggcgagttg 306 

I I I I I I III I I I I I I I I III 
Db 2 93 TTTATAGGACATTGAAGGATTGCCTGTCGCATTG 32 6 



RESULT 4 

ZMA297903 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



sig_peptide 



CDS 



gene 



mat_peptide 



BASE COUNT 
ORIGIN 



ZMA297903 579 bp mRNA PLN ll-JAN-2001 

Zea mays mRNA for basal layer antifungal peptide (bap-3b gene) . 
AJ297903 

AJ297903.1 GI:12214250 

bap-3b gene; basal layer antifungal peptide. 
Zea mays. 
Zea mays 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae; Zea. 

1 (bases 1 to 579) 

Serna Sanz,A. and Thompson, R. D . 

Maize endosperm secretes a novel antifungal protein into adjacent 

maternal tissue 

Unpublished 

2 (bases 1 to 579) 
Serna, A. 

Direct Submission 

Submitted (31-OCT-2000) Serna A., Plant Physiology, Max Planck 
Institut, Carl von Linne Weg 10, Cologne, 50829, GERMANY 

Location/Qualifiers 

1. .579 

/organism="Zea mays" 
/variety="A69Y" 
/db_xref="taxon:4577" 
/tissue_type=" endosperm" 
/dev_stage="7 days after pollination" 
25. .108 
/gene="bap-3b" 
25. .312 
/gene="bap-3b" 

/f unction="putative antifungal peptide" 
/codon_start=l 

/product="basal layer antifungal peptide" 
/protein_id="CAC21607 .1" 
/db_xref="GI : 12214251" 

/trans la tion="MVKSLDHITIRGLFLLFMFLVASFVGHAQIIRGETKENKDTNSM 
TMTTRPGSYVISMDEKSSLCFLDPRTLWYICKITYRLFRTLKDCLEFCHSI" 
25. .312 
/gene="bap-3b" 
109. .309 
/gene="bap-3b" 
/product="bap-3b protein" 
189 a 99 c 114 g 177 t 



Query Match 



9.5%; Score 42.4; DB 8; Length 579; 



Best Local Similarity 52.2%; Pred. No. 0.035; 

Matches 119; Conservative 0; Mismatches 106; Indels 



3; Gaps 



1; 



Qy 96 ccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctagga 155 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I III I 

Db 92 CCTTTGTTGGTCATGCACAGATAATAAGAGGTGAAACCAAGGAGAATAAGGACACTAACA 151 

Qy 156 aaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacgacg 215 

I I II I I I I I I I I I I I I II III I I I I I II I 
Db 152 GCATGACGATGACA ACAAGACCAGGAAGCTATGTAATTAGCATGGATGAAAAATCTA 208 

Qy 216 ggccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgct 275 

I I I I I I I I M I I I I I I I I I I I I I I I I II II I 

Db 209 GCTTGTGCTTTCTGGATCCAAGAACTCTATGGTACATCTGCAAAAT7VACATATCGCCTTT 2 68 

Qy 27 6 ataagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaa 323 

I I I I III I II I M I I I I I I I I I I I I I 

Db 2 69 TTAGGACATTGAAGGATTGCTTGGAGTTTTGCCACAGTATATGATGCA 316 



RESULT 5 

166494/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 

BASE COUNT 
ORIGIN 



166494 
Sequence 
166494 
166494 .1 



7218 bp DNA 
14 from patent US 5670367, 

GI:2724471 



PAT 



28-DEC-1997 



Unknown . 

Unknown . 

Unclassified. 

1 (bases 1 to 7218) 

Dorner,F., Scheif linger, F . and Falkner , F . Gunter . 

Recombinant fowlpox virus 

Patent: US 5670367-A 14 23-SEP-1997; 

Location/Qualifiers 

1. .7218 

/organ ism=" unknown" 
1944 a 1491 c 1486 g 1929 t 368 others 



Query Match 9.3%; Score 41.4; DB 6; Length 7218; 

Best Local Similarity 3.1%; Pred. No. 0.096; 

Matches 12; Conservative 211; Mismatches 163; Indels 0; Gaps 0; 

Qy 2 gaaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaa 61 

IN I MINI :::::: : : : :::::::: : : : : :::::: 
Db 14 4 8 GAAGAATTTGGTACRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 138 9 

Qy 62 taatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaa 121 

Db 1388 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1329 

Qy 122 agtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggca 181 

Db 1328 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 12 69 



Qy 182 tcgggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctt 241 
Db 12 68 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1209 



Qy 242 aatgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcg 301 

Db 1208 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 114 9 

Qy 302 agttgtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagac 361 

Db 1148 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 108 9 

Qy 362 agacggccagagatgangctagctag 387 

: : : : : :::::::: : : I I 

Db 1088 RRRRRRRRRRRRRRRRRRRRRRATCG 1063 



RESULT 6 
CELC30B5/C 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



CELC30B5 41334 bp DNA INV ll-APR-2001 

Caenorhabditis elegans cosmici C30B5, complete sequence. 
U23450 

U23450.1 GI:733552 
HTG. 

Caenorhabditis elegans. 
Caenorhabditis elegans 

Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; 
Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis . 

1 (bases 1 to 41334) 

The C. elegans Genome Sequencing Consortium, Washington University 
Genome Sequencing Center, St. Louis U.S.A. and the Sanger Centre, 
Hinxton, U.K.,C. 

Genome sequence of the nematode C. elegans: a platform for 
investigating biology. The C. elegans Sequencing Consortium 
Science 282 (5396), 2012-2018 (1998) 
99069613 

2 (bases 1 to 41334) 
Du,Z. 

The sequence of C. elegans cosmid C30B5 
Unpublished 

3 (bases 1 to 41334) 
Waterston,R. 

Direct Submission 
Submitted ( 13- JUL-1 995 ) 

4 (bases 1 to 41334) 
Waterston, R . 

Direct Submission 

Submitted (ll-APR-2001) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
Submitted by: 

Genome Sequencing Center 

Department of Genetics, Washington University, 
St. Louis, MO 63110, USA, and 
Sanger Centre, Hinxton Hall 
Cambridge CB10 IRQ, England 

e-mail: rw@nematode.wustl.edu and jes@sanger.ac.uk 



NOTICE: This sequence may not be the entire insert of this clone. 
It may be shorter because we only sequence overlapping sections 
once, or longer because we provide a small overlap between 
neighboring submissions. 



WARNING: These data have only had automated annotation 
and have not yet been subjected to manual review of that 
annotation. We will be manually reviewing this information 
as quickly as possible and at that time this GenBank record 
will be updated and this warning removed. 



NOTES: 

Coding sequences below are predicted from computer analysis, using 
the program Genef inder (P . Green and L. Hillier, ms in preparation). 
FEATURES Location/Qualifiers 
source 1. .41334 

/organism="Caenorhabditis elegans" 
/strain="Bristol N2" 
/db_xref="taxon: 6239" 
/ chromosome="II" 
/clone="CELC30B5" 
gene 2450. .3867 

/gene="C30B5.3" 

CDS join(2450. ,2484,2567. .2802,2852. .3627,3679. .3867) 

/gene="C30B5.3" 

/note="similar to C. elegans protein C40H1.1 (similar to 

ovarian protein (fly)); coded for by C. elegans cDNA 

yk302hl.5" 

/codon_start=l 

/protein_id="AAK31467 .1" 

/db_xref="GI: 13592368" 

/trans la tion="MHKDAAENYDKQLELRSSPQINQILRCTSNTNATSSEIQLRNRQ 
AVIVSNFREPDRRLGRYSKYYYHHNVGPEVYSRKVFVGGLPSCVKESDILNFFSRYGR 
LQVDWPSKHYECKSDSDPSLCNEPISSSSYQPSSHLAMVSPPFGEINPFMRNMPAQSE 
SSQTGGFGRI SSGS IGGFLNPGMAQVARGNLGFGSTKS DGS INGDKRQHHLGYVFLLF 
EKERSVRDLVLDCFEEEEGLFITLESSTDSIRVQIRPWLLADAEFLMDFNVPINTKLV 
AFIGGVPRPLKAVELAHFFEQT YGHVVCVGI DI DNKFKYPRGSGRVAFS DYDAYVQAI 
TDRYIVLDHEDIHKRVEIKPYFFHNQSCEECSGRYHRQHAPYFCPSLECFQYYCEPCW 
HKMHSHPSRFHHMPVVKGV" 
gene complement ( 4 58 9 . .5670) 

/gene="C30B5.4" 

CDS complement (join (4589. .5188,5315. .5444,5492. .5670)) 

/gene="C30B5.4" 

/note="Contains similarity to Pfam domain: PF00076 (rrm) , 

Score=81.4, E-value-6 . le-2 1 , N=l" 

/codon_start=l 

/evidence=not_experimental 

/protein_id="AAK31468 .1" 

/db_xref="GI : 13592369" 

/trans la tion="MNPITNIKNQNRMNERELSLGYAGDLKKSWHQTYKDSAWIYIGG 
LSYALSEGDVIAVFSQYGEVMNINLIRDKDTGKSKGFAFLCYKDQRSTILAVDNFNGI 
SLHKRMIRVDHVEEYKVPKYKEDADDETKRLWEEGCAPKPVMREAAPMEVQEQRIKKA 
KEVLLDIGDVDEELLKKIKKDKKKAKKEKKREKKRAKKIRKLEKKAARDPDGDWNNKA 



KLIDKVVAEDDLYGENKHFDFGKKKEVEEVKHNPRPDFEKADWRDIEIWKVIREREKA 

EKAARGETSEAWGPEDHYVSKRYQGR" 

5970. .6878 

/gene="C30B5.2" 

join(5970. .5985,6152. .6227,6271. .6454,6541. .6579, 

6605. .6703,6762. .6878) 

/gene="C30B5.2" 

/note="coded for by C. elegans cDNA yk402all.3; coded for 
by C. elegans cDNA yk402all,5; coded for by C. elegans 
cDNA yk455all.5; coded for by C. elegans cDNA yk745c8.5; 
coded for by C. elegans cDNA yk745c8.3" 
/codon_start=l 

/product="Hypothetical protein C30B5.2" 
/protein__id="AAK31472. 1" 
/db_xref="GI : 13592373" 

Vtranslati on= " MGGVRAVAALAFAGVVGLT FLVLGCALPRYGTWT PMFVIT FYVL 
SPVPLLIARRFQEDMTGTNACIELALFITTGIVISAFALPIVLAHAGTIANSACFLVN 
TGSHISTCIVMTTVEAGASLCSKSFTSLLISCQVIVIAMSACFLIFIANSINFSVIIF 
YFRI FNGEDMNGMSLW" 
7525. .10141 
/gene="C30B5. 1" 

join(7525. .7627,7678. .7766,7812. .7961,8028. .8289, 
8338. .8722,9104. .9364,9412. .9970,10022. .10141) 
/gene="C30B5. 1" 

/note="coded for by C. elegans cDNA ykl2e2.3; coded for by 
C. elegans cDNA ykl2e2.5; coded for by C. elegans cDNA 
ykl8d7.5; coded for by C. elegans cDNA yk44h5.3; coded for 
by C. elegans cDNA yk44h5.5; coded for by C. elegans cDNA 
yk54dl2.5; coded for by C. elegans cDNA ykl22g2.5; coded 
for by C. elegans cDNA ykl46cll.5; coded for by C. elegans 
cDNA yk209b3.3; coded for by C. elegans cDNA yk209b3.5; 
coded for by C. elegans cDNA yk218c3.3; coded for by C. 
elegans cDNA yk218c3.5; coded for by C. elegans cDNA 
yk285h2.3; coded for by C. elegans cDNA yk285hl0.5; coded 
for by C. elegans cDNA yk294c3.5; coded for by C. elegans 
cDNA yk296bll.3; coded for by C. elegans cDNA yk296bll.5; 
coded for by C. elegans cDNA yk297d3.5; coded for by C. 
elegans cDNA yk320fl2.5; coded for by C. elegans cDNA 
yk346h3.3; coded for by C. elegans cDNA yk346h3.5; coded 
for by C. elegans cDNA yk347a4.5; coded for by C. elegans 
cDNA yk350a7.5; coded for by C. elegans cDNA yk351bl.5; 
coded for by C. elegans cDNA yk397e7.3; coded for by C. 
elegans cDNA yk397e7.5; coded for by C. elegans cDNA 
yk408el0.5; coded for by C. elegans cDNA yk419c3.3; coded 
for by C. elegans cDNA yk419c3.5; coded for by C. elegans 
cDNA yk430g3.3; coded for by C. elegans cDNA yk430g3.5; 
coded for by C. elegans cDNA yk442h4.5; coded for by C. 
elegans cDNA yk451h6.3; coded for by C. elegans cDNA 
yk451h6.5" 
/codon_start=l 

/product="Hypothetical protein C30B5.1" 
/protein_id="AAK31466. 1" 
/db_xref="GI : 13592367" 

/translat ion="MNKDNSPSESLVNSSASEDKTVKTSMSANNPSINDSSLPERLPA 
MCGCDSQNREESANSGVQLSDHANESLSKQADDVIYEGGLEDKDSVFRAIIEDDDACE 
VLDDGIDNCEVEIPVAIECHLCNEMMNLCLRRTRYRGQAREYPAYRCNRKGCQTFRSI 
RKVFGNCMSDVDNSGDATTRTMIYHPPHKVRKTSPGEFSDDDSKEFVVQRVYIPKIDN 



RNKDQSVKSLSVTDRMRKANQDRAIVFSEFADQLRRDIVANKRVRIKKHVEEDQQGTL 
FYISKELNPQEVIELQDVIIKTLISMRKIPPPMTMDDLPLFANCPYSKKAVDAGEFDM 
SETEERPRPKLPKTIYQNTRTHSDSLTERSRHFTYDELKHSMEANAEGPSTCRIGISR 
SERNSESLQWIGLKTYEDSTDGFAKSVNSEGERQQGINGQIDDEVFEDESRFKPVASM 
QDVMRTLSTPSTLRPPVSPLMKSFSSMDSSNFKNPINAHPPMFFSDPHFLINNNQFNS 
PTPFYALNSYFPSFPMDSSQVYQNATDEHFTPSGASEVVCTEQTPGHHFTYIHPDSGV 
RLTDPSAFLTSFVPSPITKVPPINFNDETSRSRSASAPSTSVYNTGEFERSHKKYENE 
EDEKLMDNQNFSDEQPKE" 
gene complement ( 1034 9 . .16920) 

/gene="C30B5.5" 

CDS complement (join(10349. .10585,10628. .107 34,11184. .11582, 

11628. .11781,12053. .12141,12885. .13153,13435. .13689, 
14032. .14232,14680. .14 807,14 991. .15107,15299. .15380, 
16335. .16490,16742. .16920)) 
/gene="C30B5.5" 

/note="similar to adrenergic receptor" 
/ codon_start=l 
/evidence=not_experimental 
/protein_id="AAK314 69 . 1 " 
/db_xref="GI : 13592370" 

/trans la tion="MDVIGNITDLSPTVSGIPDECGLEPHDFLEVKFFLISVVGTLIG 
LFGLFGNATTALILTRPSMRNPNNLFLTALAVFDSCLLITAFFIYAMEYIIEYTAAFD 
LYVAWLTYLRFAFALSHISQTGSVYITVAVTIERYLAEFSRLIFQVTVNPSCPDGSNW 
QSYILLPSAMASNPIYQQVYSLWVTNFVMVFFPFLTLLLFNAIIAYTIRQSLEKYDFH 
NQKSVVAALSASVNLPRNIAGYCPSDCLCHVPLSSFTLRGLAPRRTVVYDCSRGSASS 
VATSLTSPESFRISSRNELKEKSREATLVLVIIVFIFLGCNFWGFVLTLLERIMGQET 
LMVEHHIFYTFSREAINFLAIINSSINFVIYLLFGKDFRKELVVVYGCGIRGISLRLP 
VQDKFVIWRHWKRTKSRISMNTTNRTRHKISLPQTLVEHANLERLEETRFLAHHEDGV 
QTQVSPIHALRNGSTPKIDTLQDLTSNGRPCKTSIIDDNGTVGKPRCNAFSIKI YLSL 
FNAQKMKILVNILILIRITSGMLYDGKYFDMKDRIRAECIIKKAMSDYKYLGNNDKSV 
NGEPCVPWIEVTESWLPTASDSQKMKKQASESFHHSKCRNIKLSVVNPMYSVSTPSAI 
ETGIPSGIHGPWCFIDKVGTNGTESFKYTPVTCFNYCDETKVATDNEKMRLTENGYAI 
LKQNYNPALLDPIEKLFTEYQFGNMKYYTFKKSREQPPQYLELRQKVFIALCLVFCVV 
IAWILSCYFLKKHSQKLHQKKQKALEGFYESNNLADIKLQADLRREREEA" 
gene complement (27 4 90 . .31388) 

/gene="C30B5.6" 

CDS complement (join (27490. .27593,27 675. .277 82,27835. .27927, 

27971. .28064,28802. .29630,2 97 61. .30035,3007 9. .30218, 
30288. .30444,31125. .3124 7,31308. .31388)) 
/gene="C30B5.6" 
/codon_start=l 
/evidence=not_experimental 
/product="Hypothetical protein C30B5.6" 
/protein_id="AAK31470. 1" 
/db_xref="GI : 13592371" 

/translation="MSVSTAFLLLLWSSIAGSLPLAPASPDFKADFSKLLKEAHQTDL 
MPVVEEDELQSLRHNGFQVFYTFHEQLHHLRQKTVGRNSATITEEMARIRNSIEDISE 
FSLDSNEVDEAINDITVPYGTKMKKSSKFAELSEIHNTDSSNKALPPFAFSEDQQESV 



Query Match ■ 8.9%; Score 39.4; DB 3; Length 41334; 

Best Local Similarity 47.7%; Pred. No. 0.49; 

Matches 115; Conservative 0; Mismatches 126; Indels 0; Gaps 0; 

Qy 12 6 ggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgg 185 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 5156 GGAAGAATATAAAGTCCCGAAATACAAAGAAGATGCCGATGATGAAACGAAACGATTATG 5097 



Qy 186 gcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtcttaatg 245 

I I I I III I I I I I I I II I III 

Db 5096 GGAAGAAGGATGTGCTCCAAAACCAGTAATGAGAGAAGCAGCACCTATGGAAGTTCAAGA 5037 

Qy 24 6 agtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcgagtt 305 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I 
Db 5036 ACAGAGAATCAAGAAAGCCAAAGAAGTATTATTAGATATTGGAGATGTTGATGAAGAGTT 4 977 

Qy 306 gtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagacagac 365 

I I I I I I I I I I I I I I I II I I M I I I I 
Db 4 97 6 GCTGAAAAAAATCAAAAAAGATAAGAAGAAAGCAAAAAAGGAGAAGAAGCGGGAAAAGAA 4 917 

Qy 366 g 366 
I 

Db 4916 G 4916 



RESULT 7 

AC080021 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



AC080021 188254 bp DNA HTG 03-FEB-2001 

Mus musculus clone RP23-422L7, WORKING DRAFT SEQUENCE, 12 unordered 

pieces . 

AC080021 

AC080021.2 GI:11138185 
HTG; HTGS_PHASE1; HTGS_DRAFT. 
house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 188254) 

McCombie, W.R. , Baker, J. P., Bahret,A., Bal , H . , Balija,V., 

Dedhia,N.N., de la Bastide,M., Huang, E.N. , King,L., Kirchof f , K. A. , 

Miller, B . , Nascimento, L . U . , O ' Shaughnessy , A . L . , Preston, R. R. , 

Rodriguez, S . , Santos, L., Shah,R.S., Spiegel , L . A . , Toth,K., Vil,M.D. 

and Zutavern,T. 

Mouse Genomic Sequence 

Unpublished 

2 (bases 1 to 188254) 
McCombie, W.R. 

Direct Submission 

Submitted (23-SEP-2000) Lita Annenberg Hazen Genome Sequencing 
Center, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring 
Harbor, NY 11724, USA 

On Nov 11, 2000 this sequence version replaced gi: 10280739. 
Genome Center 

Center: Lita Annenberg Hazen Genome Center, Cold Spring Harbor 

Laboratory 

Center code: CSHL 

Web site: http://www.cshl.org/genseq 
Contact: mccombie@cshl.org 

Project Information 

Center project name: RP23-422L7 
Center clone name: RP23-422L7 



* NOTE: This is a 1 working draft 1 sequence. It currently 

* consists of 12 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 



FEATURES 

source 



BASE COUNT 
ORIGIN 



* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

of 51285 bp in length 
unknown length 
of 39153 bp in length 
unknown length 
of 27097 bp in length 
unknown length 
of 14303 bp in length 
unknown length 
of 11157 bp in length 
unknown length 
of 10541 bp in length 
unknown length 
of 10095 bp in length 
unknown length 
of 8989 bp in length 
unknown length 
of 8226 bp in length 
unknown length 
of 2419 bp in length 
unknown length 
of 1826 bp in length 
unknown length 
of 13 bp in length. 

Location /Qualifiers 
1. .188254 

/organism="Mus musculus" 
/db_xref="taxon: 10090" 
■/clone="RP23-422L7" 
56697 a 36632 c 35227 g 56459 t 3239 others 



+ 


be preserved. 




•k 


i 
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contig 


•k 
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gap of 
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contig 
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gap of 
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i jz y o / 
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contig 
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gap of 
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i o4 y / u 


contig 




1 Q7 1 
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gap of 


* 


155257 


165351 


contig 


★ 


165352 


165637 


gap of 




165638 


174626 


contig 




174627 


174912 


gap of 


* 


174913 


183138 


contig 


* 


183139 


183424 


gap of 


* 


183425 


185843 


contig 


* 


185844 


186129 


gap of 




186130 


187955 


contig 




187956 


188241 


gap of 




188242 


188254 


contig 



Query Match 8.5%; Score 37.8; DB 2; Length 188254; 

Best Local Similarity 50.8%; Pred. No. 1.8; 

Matches 90; Conservative 0; Mismatches 87; Indels 0; Gaps 0; 

Qy 109 gcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgac 168 

I I I I I I I I I I I I I I I III I I I II I I I I 

Db 103606 GC AAAAAAG AAAAAAT G T G AAAT G G AAAAT CTCATGGCT G AG AAC AAGT G AG AG AAT GAG 103665 

Qy 169 aaagacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgctatct 228 

I I I I I I I I III I I I I I I III I I I I I I I I I I I I I I 
Db 103666 AAAAACCAGTGTCCCGGCATTCCTCCTTGAGAGCACTTCTTGTGACTGCACATTCTTTCT 103725 

Qy 229 agattccggtcttaatgagtacgtctgcagaaagactaataagtgctataagagctt 285 

I I I II I I I I I I I I I I II I I I I I II III II 
Db 103726 AGGTTCCTACCTTGAAGATTTCAACCCCAATAGTACCAAGTTGAGGAGCAAGCCATT 103782 



RESULT 8 
AC020602 

LOCUS AC020602 170225 bp DNA HTG 12-MAY-2001 



DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Homo sapiens chromosome 2 clone RP11-4 61M18, WORKING DRAFT 

SEQUENCE, 5 unordered pieces. 

AC020602 

AC020602.5 GI:14029092 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ACTIVEFIN . 
human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 170225) 
Waterston, R. H . 

The sequence of Homo sapiens clone 
Unpublished 

2 (bases 1 to 170225) 
Waterston, R. H . 
Direct Submission 

Submitted ( 05-JAN-2000 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

On May 12, 2001 this sequence version replaced gi:13992766. 



Genome Center 

Center: Washington University Genome Sequencing Center 
Center code: WUGSC 

Web site : http: //genome . wustl . edu/gsc/ index . shtml 

Project Information 

Center project name: H_NH0461M18 

Summary Statistics 

Sequencing vector: M13; 65% 
Sequencing vector: plasmid; 31% 
Chemistry: Dye-primer ET; 65% of reads 
Chemistry: Dye-terminator Big Dye; 31% of reads 
Assembly program: Phrap; version 0.990319 
Consensus quality: 166879 bases at least Q40 
Consensus quality: 167963 bases at least Q30 
Consensus quality: 168640 bases at least Q20 
Insert size: 166000; agarose-fp 
Insert size: 169825; sum-of -contigs 
Quality coverage: 7.42 in Q20 bases; agarose-fp 
Quality coverage: 7.36 in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft 1 sequence. It currently 
consists of 5 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with" the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 1167: contig of 1167 bp in length 

1168 1267: gap of unknown length 

1268 2828: contig of 1561 bp in length 

2829 2928: gap of unknown length 

2929 36359: contig of 33431 bp in length 
36360 36459: gap of unknown length 
36460 86624: contig of 50165 bp in length 
86625 86724: gap of unknown length 



* 86725 170225: contig of 83501 bp in length. 
FEATURES Location/Qualifiers 
source 1. .170225 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ c h r omo s ome = " 2 " 
/clone="RPll-461M18" 
misc__f eature 1. .1167 

/note="assembly_name : Contig42" 
misc_feature 1268. .2828 

/note="assembly_name : Contig48" 
misc_feature 2929. .36359 

/note="assembly_name : ContigSO" 
misc_feature 36460. .86624 

/note="assembly_name : Contig51 " 
misc_feature 86725. .170225 

/note="assembly_name : Contig52 " 
BASE COUNT 54678 a 34158 c 31942 g 49045 t 402 others 
ORIGIN 



Query Match 8.3%; Score 36.8; DB 2; Length 170225; 

Best Local Similarity 54.4%; Pred. No. 3.6; 

Matches 74; Conservative 0; Mismatches 62; Indels 0; Gaps 0; 

Qy 26 cgtcaaccaagggcaaattcaacaacctccaaagaataatccgggtgccttccaagaatc 85 

I I I I I I I I I I I I I I I I I I III I I I I II I I II II 
Db 112 451 CTTCAACCCAAGGCATCATTAAGAAAGTGAACAGGCCAGGCGCGGTGGCTCACACCTATA 112510 

Qy 86 ctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagc 145 

I I I I I I I I II I II II I I I I I I I I I I I III I I I I 

Db 112511 ATCCCAGCACTTTGGGAGGCCCAGACAGGCAGATCACGAGGTCAGGAGTTCGAGACCAGC 11257 0 



Qy 146 aatgctaggaaaatga 161 

III I I I I I 
Db 112571 CTGGCCAACATAGTGA 112586 



RESULT 9 

CELY34D9A 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 



CELY34D9A 4 9307 bp DNA INV 06-APR-2001 

Caenorhabditis elegans cosmid Y34D9A, complete sequence. 
AC024756 

AC024756.1 GI:7140309 
HTG. 

Caenorhabditis elegans. 
Caenorhabditis elegans 

Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; 
Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis. 
1 (bases 1 to 49307) 

The C. elegans Genome Sequencing Consortium, Washington University 
Genome Sequencing Center, St. Louis U.S.A. and the Sanger Centre, 
Hinxton, U.K.,C. 

Genome sequence of the nematode C. elegans: a platform for 
investigating biology. The C. elegans Sequencing Consortium 
Science 282 (5396), 2012-2018 (1998) 
99069613 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



2 (bases 1 to 49307) 
Waterston, R.H. 
Direct Submission 

Submitted ( 01-MAR-2000 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

3 (bases 1 to 49307) 
Waterston, R. 

Direct Submission 

Submitted (24-MAR-2000 ) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 

4 (bases 1 to 49307) 
Waterston, R. 

Direct Submission 

Submitted (06-APR-2001) Department of Genetics, Washington 
University, 4444 Forest Park Avenue, St. Louis, Missouri 63108, USA 
Submitted by: 

Genome Sequencing Center 

Department of Genetics, Washington University, 
St. Louis, MO 63110, USA, and 
Sanger Centre, Hinxton Hall 
Cambridge CB10 IRQ, England 

e-mail: rw@nematode.wustl.edu and jes@sanger.ac.uk 



NOTICE: This sequence may not be the entire insert of this clone. 
It may be shorter because we only sequence overlapping sections 
once, or longer because we provide a small overlap between 
neighboring submissions. 



NOTES: 

Coding sequences below are predicted from computer analysis, using 
the program Genef inder (P . Green and L. Hillier, ms in preparation). 
FEATURES Location/Qualifiers 
source 1. .49307 

/organ ism="Caenorhabdit is elegans" 
/strain="Bristol N2" 
/db_xref="taxon:6239" 
/chromosome="I n 
/clone="CELY34D9A" 
gene complement ( 17 91 . .11784) 

/gene="Y34D9A. 1" 

CDS complement (join (1791. .1910,5779. .6091,8043. .8493, 

11369. .11479,11538. .11784)) 
/gene="Y34D9A. 1" 

/note="coded for by C. elegans cDNA yk56g7.3; coded for by 
C. elegans cDNA yk69f5.5; coded for by C. elegans cDNA 
yk56g7.5; coded for by C. elegans cDNA yk632g7.5" 
/codon_start=l 

/product= n Hypothetical protein Y34D9A.1" 
/protein_id="AAK29884 .1" 
/db_xref="GI: 13559675" 

/translation="MSAAQYARLVPKKYRSKTLPKIDRPWRPRVIAWAGPAAFYPNRF 
YEVDKWYKARIDKPEKLPEMHIIEPAEHMKSLKVLMQKSEIEQINIGFKRREVAGKAE 
KSAEDRIELERKSRHMQLKIDIDHLDVENLSIYRHFQVFDHLFGDNIFFENVQNLQVN 



FENDIVVHSGNVITANSTLKRPEITIESVGNGGGFNTLLMINLDGNALDLGKNGEIVQ 

WMISNIPDGEAISAGSEIIDYLQPLPFYGTGYHRVAFVLFRHEKPVDFQIQGNSLDTR 

IHEISKFYKKHEATITPSAIRFFQTSYDNSVKMALHGLGMTSPLYEYEHRPALKPAQR 

EFPEKPQPFDLYLDMYRDPKEVEQEMLEKRLAEVKLDYVKEPKWVDTDYVENKKKLPA 

WLHAKKLERDGVGHAKYHNDL" 

complement (12705. . 15311) 

/gene="Y34D9A. 4" 

•complement (join (12705. .12 959, 14 804. . 15088, 15141 . .15311) ) 
/gene="Y34D9A. 4" 

/note="coded for by C. elegans cDNA yk747h4.5; coded for 
by C. elegans cDNA ykll0f3.5; coded for by C* elegans cDNA 
yk82al.5; coded for by C. elegans cDNA yk267c3.5; coded 
for by C. elegans cDNA yk461f9.5; coded for by C. elegans 
cDNA yk425b7.5; coded for by C. elegans cDNA yk292d8.5; 
coded for by C. elegans cDNA yk533f3.5; coded for by C. 
elegans cDNA yk470a4.5" 
/codon_start=l 

/product="Hypothetical protein Y34D9A.4" 
/protein_id="AAK29885. 1" 
/db_xref="GI : 13559676" 

/translation="MSRRHSVVDAYSTVEGKIRDTMRELSALWDEVDMSEAMRLKRVD 

NAFTHITLLCDDMLSGEKEMIHNLKVSIREDMQNVKKMRLELEMEDFQRPAEIKDGSI 

ALMRHLQSEVKSLETEFQTRHEDQRVLIEKICNLKKRLESDFEFEYEIHAKLNFDTDL 

PPCMSPATSEMISFITPTRRQAVGPKTSSPKDTQPSTSSSSRTTTPMSKRAMTPSSIA 

SSTPSSAKKVLTRRNQFL" 

14384 . .22639 

/gene="Y34D9A. 3" 

join(14384. .14 680,15517. .15594,1634 3. .16373,16423. 

16567. .167 55,16808. .16939,16997. .17134,21214. .217 57, 
21990. .22522,22574 . .22 639) 
/gene="Y34D9A. 3" 

/note="coded for by C. elegans cDNA yk533f3.5; coded for 
by C. elegans cDNA yk747h4.5; coded for by C. elegans cDNA 
yk35d4.5; coded for by C. elegans cDNA ykl54c9.5" 
/codon_start=l 

/product="Hypothetical protein Y34D9A.3" 
/protein__id="AAK29886. 1" 
/db__xref="GI : 13559677" 
■ /translation="MSICFRYETFLPKLLKLLMLLGPSELDIFQPMLVKRLQLAVVRL 
QRVQKPLREHKFRGI DVLLQQILQI I I PADI VGHILRHTLPLRDVGLETLDLVVAFFV 
ARPVPEENACAFNFSLILNIFEQQIMRRRRCIPVINDITEGFAAEELEYAVFDESLFT 
SIWKSPERLSNETDFHLLLTALSREIAAENYQEPLLILKFMKKTAIGQDEVINRIFTS 
LLKQILRDETRKKLDPHVVLDFLDAGDKLFLDSETSKKAYAILKLYFLCRNRKSKLEC 
FSCGCVQKYFIELYKDGTRLHEALNCLIKSKLSDSPTIIERKPEKRIIIIFFDIMKLL 
GAEKPKIHSYFKENRTKTKLLMNSELELAVEYLRDFSDYELNWFLQQRADEDFQFLLE 
TAKNLQAPVNSDGFWRNSLEKFDENSIIFIENQLKIMATRIDRKGLALRIQQISQKMM 
QKSGKIEEKTAIFIKFPQILLEIADFSILPPNSSNLLFSTLILGFSAFYDGNEAQFND 
LFRQFLSKTIPNSTCTSPMEIIEEICEKIRIKSNFRHLPDISFLKPNPDTI ITQISEI 
FDTKIAKNDERIVEICSILLEKLAPQLQAFGDRGKTPFCQIVNIIVDKYVKNSVEPVN 
FQLFREVCERAVDGFAMMTEQWNFLFLSAKILTILNLTRREAGLFAVLMRKINKNEIL 
LTKLRGKKEFVQLEGSLEL" 
23360. .30062 
/gene="Y34D9A.7" 

join (23360. .23566,25232. .25801,27316. . 27 378 , 28218 . 



29198. .29291,29380. .29552,29610. .29687,29832. .30062) 



/gene="Y34D9A.7" 

/codon_start=l 

/evidence=not_experimental 

/product="Hypothetical protein Y34D9A.7" 

/protein_id="AAK29880. 1" 

/db_xref= M GI : 13559671" 

/translation="MSEDGEISLSPNSAALMDGLDDFIIDKNCSGIIPNEDLERETPS 
ADDDAEVKTSSNWAKNMRKRMFGEQCDEPEESEKKRRRTNCCFNCRGGDHSIAQCPEP 
KNFAEIRKNKLEFMNDKQQQHQPTGRISQVTEQQQEAKFKPGRLSQNLRKALSLGPDD 
IPEWVYRMRRLGFYRGYPPGYLRKSLKREFATLKIYSEDHNQEEVDNDDDEEARPAPT 
IQSEKVHFYMGFNKTYGALRDRERGRFEVPPFDIFCEMLQTFLPNSYLNFRQLDLFDG 
ARLHEVARDHEGTEKIRIREERHQRSELRKIREQQKEEEEQQQRDSAEKVEKEVTVDQ 
TISEV7VAAAAAVEEKPGTPEPVGRMKKLNFLNKKLIFFSRMLYFSKIKSKFYSRKKVK 
KFFFSKNQYSSKKLKFSSISLPKI YISRISISNFFFKVFCPKIDFFFPKFFQIFFPKT 
YFSKKKKRIRKSIAGILLNTGKFPGESISEFIGTPIFSRRDVTGTWIEDTVPSLEAFS 
VGIVPFEAKEEEKPRGIFKKIMSTLKGITGRNSGDDEMKKKE" 

gene 31479. .32514 

/gene="Y34D9A. 6" 

CDS join(31479. .31562,32281. .32514) 

/gene="Y34D9A. 6" 
/codon_start=l 
/evidence=not_experimental 
/product="Hypothetical protein Y34D9A.6"- 
/protein_id="AAK29881. 1" 
* /db_xref="GI: 13559672" 
/trans la tion="MSKAFVDGLLQSSKVVVFSKSYCPYCHKARAALESVNVKPDALQ 
WIEIDERKDCNEIQDYLGSLTGARSVPRVFINGKFFGGGDDTAAGAKNGKLAALLKET 
GAL" 

gene complement { 32800 . .34038) 

/gene="Y34D9A.8" 

CDS complement (join (32800. .33005,337 92. .33901,33968. .34038)) 

/gene="Y34D9A. 8" 
/codon_start=l 
/evidence=not_experimental 
/product="Hypothetical protein Y34D9A.8" 
/protein_id="AAK2 98 7 9.1" 
/db_xref="GI : 13559670" 

/translation="MADSLLSNILQQEITDFPELFDMGGGAPMAREGVAQPRQTNVQA 
TVAAVKETTTITAESSGTVTIQYSHIFFAFVAFFVLSVAVVAVIRRRSRQKSGFRNRR 
GGGHGGPSILQQDSDEDDILISSMYS" 
gene complement ( 3 6238 . .44285) 

/gene="Y34D9A. 9" 

CDS complement (join (36238 . .36345,37 937. .38081,39570. .39817, 

40470. .40532,42326. .4254 6,4 3973. .44285)) 
/gene="Y34D9A. 9" 
/codon_start=l 
/evidence=not_experimental 
/product="Hypothetical protein Y34D9A.9" 
/protein_id="AAK29878 . 1" 
/db_xre'f="GI : 13559669" 

/translation="MSEYEPIGIDYTHTHTQTLRSSLQMTPCGSQRQGEKRREKKWDR 
ECFCVCVSCCVSRCWDTHTCFCCSSAALLLSHTLLRLSCSSLLSAAPHTLELGGGSLS 
SSGNRKPWQGILLFGPPGTGKSYIAKAVATEAGESTFFSISSSDLMSKWLGESEKLVK 
NLFALAREHKPS I I FI DEFLPNS YSNFRQSALFDGARLHQFNTYFSNNRRFEKRI YI P 
LPDIHARKEMFRIDVGKNYNTLTDQDFKVLAERCEGYSGYDISILVKDALMQPVRRVQ 
SATHFKHVSGPSPKDPNVIAHDLLTPCSPGDPHAIAMNWLDVPGDKLANPPLSMQDIS 
RSLASVKPTVNNTDLDRLEAFKNDFGQDGQE" 



gene 44323. .46794 

/gene="Y34D9A.2" 
CDS join(44323. .44706,45476. .45882,46530. .46794) 

/gene="Y34D9A.2" 

/codon_start=l 

/evidence=not_experimental 

/product="Hypothetical protein Y34D9A.2" 

/protein_id="AAK29882 .1" 

/db_xref="GI : 13559673" 

/trans lation="MYGFFNSSEIDEEYYNSTHTSAPSPILALIFTIICIIGVTGNAS 
LLVYIFAKKLYQNFISSRFIGHLCFTNLIALLVLVPVIIHNVFTGVNLLQDSNMLCRI 
QVTETFSAVTWSQKVPFRFDLRKMREVSITVTVWTVIAMMNLCIAGVHLLTFARIHYE 
QLFGLTPTKLCILSWIISWLLSLPSLTNGHVAIYGPAVRTCVFSHSDSGLKFLTYTMI 
FGVFIPALFSSIAYFRILQTLFHSPIVFQSLGLYKSRFLVYFFLLGPLYALPFYILTA 



Query Match 8.1%; Score 36; DB 3; Length 49307; 

Best Local Similarity 47.0%; Pred. No. 5.6; 

Matches 111; Conservative 0; Mismatches 125; Indels 0; Gaps 0; 

Qy 96 ccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctagga 155 

I I I I I I I III I I II I I I I I I II I I II 

Db 16858 CCTTGATTCTGAAACTTCAAAAAAAGCATATGCAATTCTAAAATTGTATTTTCTCTGCAG 16917 



Qy 156 aaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacgacg 215 

I I I 1 I III I I I I I I I I I I I I I II I I 

Db 16918 AAATCGAAAATCCAAGCTAGAGGTTGGTGATTTTCCGCCCGGAAAACTGAAAACCCGCCG 16977 

Qy 216 ggccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgct 275 

II II I I I I II I I I I I I I I I I I I I I I II 

Db 16978 AAAATTCCATTTTTTCCAGTGTTTTTCGTGCGGTTGTGTGCAAAAATACTTTATCGAGCT 17037 



Qy 27 6 ataagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagat 331 

II II I I I I I III MM I I I I I I I II I I I M I II 
Db 17038 CTACAAGGATGGTACGAGGCTTCACGAGGCGTTAAACTGCCTGATAAAATCAAAAT 170 93 



RESULT 10 
AC006735/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AC006735 166214 bp DNA HTG 25-FEB-1999 

Caenorhabditis elegans clone Y34D9, *** SEQUENCING IN PROGRESS ***, 
4 unordered pieces . 
AC006735 

AC006735.3 GI:4309801 
HTG; HTGS_PHASE1. 
Caenorhabditis elegans . 
Caenorhabditis elegans 

Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; 
Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis. 

1 (bases 1 to 166214) 
Waterston, R. H. 

The sequence of Caehorhabditis elegans clone 
Unpublished 

2 (bases 1 to 166214) 
Waterston, R. H . 
Direct Submission 

Submitted (23-FEB-1999) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



MO 63108, USA 

On Mar 1, 1999 this sequence version replaced gi:4263429. 
* NOTE: This is a 'working draft 1 sequence. It currently 
consists of 4 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 2055: contig of 2055 bp in length 

2056 2069: gap of unknown length 

2070 20299: contig of 18230 bp in length 
20300 20313: gap of unknown length 
20314 70328: contig of 50015 bp in length 
70329 70342: gap of unknown length 
70343 166214: contig of 95872 bp in length. 
Location/Qualifiers 
1. .166214 

/organism="Caenorhabditis elegans" 
/db_xref= n taxon: 6239" 
/clone="Y34D9" 
a 31045 c 31031 g 51463 t 42 others 



52633 



Query Match 8.1%; Score 36; DB 2; Length 166214; 

Best Local Similarity 47.0%; Pred. No. 6.4; 

Matches 111; Conservative 0; Mismatches 125; Indels 0; Gaps 0; 

Qy 96 ccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctagga 155 

I I I I I I I III I I I I I I I I I I I I I I II 

Db 81072 CCTTGATTCTGAAACTTCAAAAAAAGCATATGCAATTCTAAAATTGTATTTTCTCTGCAG 81013 

Qy 156 aaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacgacg 215 

I I I I I III I I I I I I I I I I I I 1 I I I I 

Db 81012 AAATCGAAAATCCAAGCTAGAGGTTGGTGATTTTCCGCCCGGAAAACTGAAAACCCGCCG 80953 

Qy 216 ggccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgct 275 

II II I I I I III II I I I I I II I I I I I I I 

Db 80952 AAAATTCCATTTTTTCCAGTGTTTTTCGTGCGGTTGTGTGCAAAAATACTTTATCGAGCT 80893 

Qy 276 ataagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagat 331 

II II Mill M I I I I I I I I I I I I I I I I I I I I I I 
Db 80892 CTACAAGGATGGTACGAGGCTTCACGAGGCGTTAAACTGCCTGATAAAATCAAAAT 80837 



RESULT 11 

AC011081 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



AC011081 183980 bp DNA HTG 10-SEP-2000 

Homo sapiens clone RP11-45019, WORKING DRAFT SEQUENCE, 20 unordered 

pieces . 

AC011081 

AC011081.3 GI:10047669 

HTG; HTGS_PHASE1; HTGSJDRAFT . 

human . 

Homo sapiens 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 183980) 

AUTHORS Birren,B., Linton, L., Nusbaum,C. and Lander, E. 

TITLE Homo sapiens, clone RP11-45019 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 183980) 

AUTHORS Birren,B., Linton, L ., Nusbaum, C . , Lander, E., Allen, N., Anderson,M. 

Baldwin, J., Barna,N., Beckerly,R., Boguslavkiy, L . , Boukhgalter , B . , 
Brown, A., Castle, A., Colangelo, M . , Collins, S., Collymore , A . , 
Cooke, P., DeArellano, K. , Dewar,K., Domino, M., Donelan,L., Doyle, M. 
Ferreira,P., FitzHugh,W., Forrest, C, Funke,R., Gage,D., 
Galagan,J., Gardyna,S., Grant, G., Hagos,B., Heaford,A., Horton,L., 
Howland, J.C. , Johnson,R., Jones, C, Kann,L., Karatas,A., Klein, J,, 
Lehoczky,J., Lieu,C, Locke, K., Macdonald, P . , Marquis, N., 
McEwan,P., McGurk, A . , McKernan,K., McLaughlin, J. , Meldrim,J., 
Morrow, J., Naylor,J., Norman, C.H., O'Connor, T., O 1 Donnell , P . , 
Peterson, K., Pollara,V., Riley, R., Roy, A., Santos, R., Severy,P., 
Stange-Thomann, N. , Stoj anovic, N . , Subramanian, A. , Talamas, J. , 
Tesfaye,S., Tirrell,A., Vassiliev, H . , Vo,A., Wheeler, J., Wu,X., 
Wyman,D., Ye,W,J., Zimmer,A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted ( 01-OCT-l 999 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Sep 10, 2000 this sequence version replaced gi: 7637227. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi . mit . edu 

Project Information- 
Center project name: L1215 
Center clone name: 4 5_0_19 

Summary Statistics 

Sequencing vector: M13; M77815; 100% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.960731 
Consensus quality: 165551 bases at least Q40 
Consensus quality: 173368 bases at least Q30 
Consensus quality: 177359 bases at least Q20 
Insert size: 198000; agarose-fp 
Insert size: 182080; sum-of-contigs 
Quality coverage: 3.5 in Q20 bases; agarose-fp 
Quality coverage: 3.8 in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft' sequence. It currently 
consists of 20 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 5302: contig of 5302 bp in length 



* 5303 5402: gap of 100 bp 

* 5403 6656: contig of 1254 bp in length 

* 6657 6756: gap of 100 bp 

* 6757 8184: contig of 1428 bp in length 

* 8185 8284: gap of 100 bp 

* 8285 10720: contig of 2436 bp in length 

* 10721 10820: gap of 100 bp 

* 10821 13684: contig of 2864 bp in length 

* 13685 13784: gap of 100 bp 

* 13785 15838: contig of 2054 bp in length 

* 15839 15938: gap of 100 bp 

* 15939 17986: contig of 2048 bp in length 

* 17987 18086: gap of 100 bp 

* 18087 21602: contig of 3516 bp in length 

* 21603 21702: gap of 100 bp 

* 21703 24984: contig of 3282 bp in length 

* 24985 25084: gap of 100 bp 

* 25085 29305: contig of 4221 bp in length 

* 29306 29405: gap of 100 bp 

* 29406 57392: contig of 27987 bp in length 

* 57393 57492: gap of 100 bp 

* 57493 64118: contig of 6626 bp in length 

* 64119 64218: gap of 100 bp 

* 64219 71303: contig of 7085 bp in length 

* 71304 71403: gap of 100 bp 

* 71404 79950: contig of 8547 bp in length 

* 79951 80050: gap of 100 bp 

* 80051 91968: contig of 11918 bp in. length 

* 91969 92068: gap of 100 bp 

* 92069 107529: contig of 15461 bp in length 

* 107530 107629: gap of 100 bp 

* 107630 128021: contig of 20392 bp in length 

* 128022 128121: gap of 100 bp 

* 128122 148757: contig of 20636 bp in length 

* 148758 148857: gap of 100 bp 

* 148858 178744: contig of 29887 bp in length 

* 178745 178844: gap of 100 bp 

* 178845 183980: contig of 5136 bp in length. 
FEATURES Location/Qualifiers 

source 1. .183980 

/organism-"Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="RPll-45019" 

/clone_lib-"RPCI-ll Human Male BAC" 
misc_f eature 1. .5302 

/note= " as sembly_f ragment 

clone_end: SP6 

vector_side : left" 
inisc_f eature 5403. .6656 

/note=" as sembly_f ragment" 
misc_f eature 6757. .8184 

/note= " as sembly_f ragment" 
misc_feature 8285. .10720 

/note- " as sembly_f ragment" 
misc_feature 10821. .13684 

/note= " as sembly_f ragment " 
misc feature 13785. .15838 



/note="assembly_f ragment" 
misc_feature 1593 9. .17986 

/note="assembly_f ragment" 
misc_feature 18087 . .21602 

/note= " as sembly_f ragment " 
misc_feature 21703. .24 984 

/note= " as sembly_f ragment " 
misc_feature 25085. .29305 

/note=" as sembly_f ragment " 
misc_f eature 29406. .57392 

/note="assembly_f ragment" 
misc_feature 57 4 93. .64118 

/note= " as sembly_f ragment " 
misc_feature 64219. .71303 

/note= " as sembly_f ragment " 
misc_feature 71404. .79950 

/not e=" as sembly_f ragment" 
misc_feature 80051. .91968 

/no te="assembly_f ragment " 
misc_feature 92069. .107529 

/note= " as sembly_f ragment " 
misc_feature 107630. .128021 

/note=" as sembly_f ragment" 
misc_feature 128122. .148757 

/note="assembly_f ragment " 
misc_feature 148858. .178744 

/not e= "as sembly_f ragment" 
misc_feature 178845. .183980 

/note= " as sembly_f ragment 

clone_end:T7 

vector_side : right" 
BASE COUNT , 57357 a 33536 c 32891 g 58294 t 1902 others 
ORIGIN 



Query Match 8.1%; Score 36; DB 2; Length 183980; 

Best Local Similarity 56.9%; Pred, No. 6.5; 

Matches 66; Conservative 0; Mismatches 50; Indels 0; Gaps 0; 

Qy 4 6 aacaacctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcc 105 

I I I I I I I II I II MM II II I I I I I I I I I I I I I I 
Db 25914 AACATCTTCAGTAGGCCCAGCCCGGTGGCTCACACCTGTAATCCCAGCACTCTGGGAGGC 25973 

Qy 106 caagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatga 161 

I I I I I I I I I I I I I I I II Ml Ml I II I I I I I I 
Db 25974 CGAGGAAGGCAGATCACGAGGTCATCCGATCGAGACCATCCTGGCTAACACAGTGA 26029 



RESULT 12 
AL353592/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 



AL353592 63325 bp DNA PRI 30-MAY-2001 

Human DNA sequence from clone RP11-569012 on chromosome 13, 
complete sequence. 
AL353592 
AL353592. 
HTG. 
human . 



9 GI:14272259 



ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



repeat 
repeat 
repeat 
repeat 
repeat 



Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 63325) 
Sycamore, N . 
Direct Submission 

Submitted ( 30-MAY-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : clonerequest @ Sanger . ac . uk 

On May 31, 2001 this sequence version replaced gi: 13751329. 
During sequence assembly data is compared from overlapping clones. 
Where differences are found these are annotated as variations 
together with a note of the overlapping clone name. Note that the 
variation annotation may not be found in the sequence submission 
corresponding to the overlapping clone, as we submit sequences with 
only a small overlap as described above. 

This sequence was finished as follows unless otherwise noted: all 
regions were either double-stranded or sequenced with an alternate 
chemistry or covered by high quality data (i.e., phred quality >= 
30); an attempt was made to resolve all sequencing problems, such 
as compressions and repeats; all regions were covered by at least 
one plasmid subclone or more than one M13 subclone; and the 
assembly was confirmed by restriction digest. The following 
abbreviations are used to associate primary accession numbers given 
in the feature table with their source databases: Em:, EMBL; Sw:, 
SWISSPROT; Tr : , TREMBL; Wp : , WORMPEP; Information on the WORMPEP 
database can be found at 

http : //www . Sanger . ac . uk/Pro j ects/C_elegans/wormpep This sequence 
was generated from part of bacterial clone contigs of human 
chromosome 13, constructed by the Sanger Centre Chromosome 13 
Mapping Group. Further information can be found at 
http : / /www . sanger . ac . uk/HGP/Chrl3 

RP11-569012 is from the library RPCI-11.2 constructed by the group 
of Pieter de Jong. For further details see 
http : //www. chori . org/bacpac/home . htm 
VECTOR: pBACe3.6 

IMPORTANT: This sequence is not the entire insert of clone 
RP11-569012 It may be shorter because we sequence overlapping 
sections only once, except for a 100 base overlap. 

The true left end of clone RP11-274M5 is at 63226 in this sequence. 
The true right end of clone RP11-431P10 is at 100 in this sequence. 
Location/Qualifiers 
1. .63325 
/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ chromosome=" 13" 
/clone="RPll-569012" 
/clone_lib="RPCI-ll . 2" 
_region 444. .814 

/note="MER7A repeat: matches 1. .345 of consensus" 
_region 1032. .1121 

/note="MIR repeat: matches 80. .165 of consensus" 
region 2068. .2169 

/note="MIR repeat: matches 73. .177 of consensus" 
region 6485. .6506 

/note="ll copies 2 mer aa 100% conserved" 
_region 8486 . » .8616 







/ not e= 


"MER94 repeat: matches 5. .134 of consensus" 


repeal 


region 




. o y o o 






/ not e= 


"MIR repeat: matches 6. .168 of consensus" 


repeat 


region 


QQCQ 

y y do . 


. yu4o 






/note= 


inEjiu repeat: matcnes z/y. . j / ± or consensus 


repeat 


region 


OC\ A A 

y u 4 4 . 


. y / y 4 






/ not e= 


lnbiD iiN 1 riKWAii repeat, matcnes i. . /o4 or consensus 


repeat 


region 


y / yo . 


1 n 1 a 0 
. 1U1 4 Z 






/ not e= 


"THE1B repeat: matches 1. .364 of consensus" 


repeat 


region 


i a i a o 
1U 1 4 o . 


1 A A 1 "7 
. 1 UZ 1 / 






/ note= 


IXIIK repeat: matcnes ibz. .zju or consensus 


repeat 


region 


1 OD 1 U . 


i a n c o 
. 1 4 UOZ 






/ not e= 


"HAL1 repeat: matches 170. .660 of consensus" 


repeat 


region 


1 4 U DO . 


1 / OCA 






/ not e= 


"HALl repeat: matches 627. .915 of consensus" 


repeat 


region 


1 A "3 O C 

1 4 jo 0 . 


1 / A O C 

. 1 4 4 o D 






/note= 


"MER94 repeat: matches 33. .134 of consensus" 


repeat 


region 




. i d yz y 






/ not e= 


"24 copies 2 mer tt 75% conserved" 


repeat 


region 


1 CJ C\ R A 

lo(J04 . 


. 1 OJJD 






/ note= 


"AluSq repeat: matches 11. .310 of consensus" 


repeat 


region 


O O A A 1 

z z y 4 1 . 


. Z oUDl 






/ not e= 


"AluJb repeat: matches 5. .120 of consensus" 


repeat 


region 


243 / 4 . 


i / a a a 

. z 4 4 yz 






/ note= 


MIR repeat: matcnes lo / . . zr>y or consensus 


repeat 


region 


o / n A C 

Z 4 / 4 0 . 


0 / *7 Q Q 

. z 4 / y y 






/ not e= 


"27 copies 2 mer aa 75% conserved" 


repeat 


region 


z o y 4 z . 


. Z b JZ J 






/ not e= 


"THE1C repeat: matches 3. .371 of consensus" 


repeat 


region 


Z OJDO . 


n a a o i 
. oUZ ji 






/ not e= 


"L1PA16 repeat: matches 4376. .6157 of consensus" 


repeat 


region 


oUZoZ . 


*3 A C Q Q 

. judo y 






/ not e= 


L1PA4 repeat: matcnes 5/o9. . ol4b or consensus 


repeat 


region 


3uo yu . 


. ozllU 






/ note= 


L1PA16 repeat: matches 2931. .43/6 or consensus 


repeat 


region 


32111 . 


. 32533 






/ note= 


"MLT2B repeat: matches 1. .444 of consensus" 


repeat 


region 


3Z 034 . 


. 3H U DO 






/ not e= 


LIPAId repeat: matcnes 10 /5. ,z931 or consensus 


repeat 


region 


3 4 U / b . 


. JOD4U 






/ note= 


"L1PA4 repeat: matches 4576. .6140 of consensus" 


repeat 


region 


o a c a a 


O C O T O 

. OOZ / 0 






/ note= 


LXrAio-lb repeat: matcnes 4/b. .iizi or consensus 


repeat 


region 


3 6273 . 


o r o o ™) 

. 36337 






/note= 


LlPAlo-16 repeat: matches 495. .559 or consensus 


repeat 


region 


*3 £T o o a 

3b3z9 . 


. 3 / oz 1 






/note= 


LIPAlo-lo repeat: matcnes -694. ,4yz or consensus 


repeat 


region 


JOlUO . 


. JO / J / 






/ note= 


LiPAiJ repeat: matcnes oooo. .blob or consensus 


repeat 


region 


"D O 1 1 O 

3o / /z . 


o o o o c 
. OOOOO 






/ note— 


r -b/iiy] repeax,. matcnes x. .±zj or consensus 


repeat 


_region 


41019. 


.41398 






/note= 


"L1ME3 repeat: matches 5768. .6148 of consensus" 


repeat 


region 


41706. 


.41745 






/note= 


"20 copies 2 mer tg 80% conserved" 


repeat 


region 


43615. 


.44111 






/note= 


"MER1A repeat: matches 1. .527 of consensus" 



repeat 


region 


A A O Q Q 
4 4 JO O . 


. 4 4 D J O 






/ not e= 


riiK repeat: matcnes yz. .Z4y or consensus 


V* V\ ^ +■ 

lepcaL 


rey ion 


4 DjUU . 


d £7 4 1 

. f± D f *i 1 






/not e— 


LYiiK repeat, matcnes o. .zoo or tonbciibua 


repeat 


region 


400DJ . 


/i q n "5 /i 
. 4 y U J 4 






/ note= 


"MIR repeat: rnatches 79. .247 of consensus" 


repeat 


region 


t yu J / . 


a q a n £ 
. 4 y 4 u 0 






/note— 


"MLT1A2 repeat: matches 1. .370 of consensus" 


repeat 


region 


4 y o o y . 


. ou i y y 






/ not e= 


hiui repeat, matcnes i. . ouy or consensus 


repeat 


region 


jU /ll. 


RD7 ^fl 

. JU / JO 






/ not e = 


14 copies z iner ct :?zi5 consexveu 


repeat 


region 


JJJ / 1 . 


t; 0 C C CL 






/not e= 


o copies iz mer do? conservea 


repeat 


region 


jjD / z . 


. jj y / o 






/ not e= 


"AluSx repeat: rnatches 1. .305 of consensus" 


repeat 


region 




. 3 4 ODD 






/ not e= 


MLKb /l repeat: matcnes izz. . /uy or consensus 


repeat 


region 


d 4 y o o . 


. OOU4 O 






/ not e= 


"MER5A repeat: matches 103. .162 of consensus" 


repeat 


region 


C/l QQC 

d 4 y y d . 


c c n 7 *5 

. jjU to 






/ not e= 


"MER5A repeat: matches 9. .88 of consensus" 


repeat 


region 


CtOI A 

JJilU . 


. D D D 1 4 






/ not e= 


lifad repeat: matcnes doj / . ,oi4z or consensus 


repeat 


region 


C Q C T . 


C.CTOQ 

. o d / z y 






/ not e= 


"MLT1A1 repeat: matches 1. .365 of consensus" 


repeat 


region 


d o 1 4 y . 


. jOZDZ 






/ not e= 


"MIR repeat: matches 79. .196 of consensus" 


repeat 


region 


DO 4 4 z . 


CQ/t QQ 

. oo 4 o y 






/ not e= 


"4 copies 12 mer 89% conserved" 


repeat 


region 


/"AC CQ 


. oU oz y 






/ not e= 


"L1MA9 repeat: matches 6110. .6158 of consensus" 


repeat 


region 


DUDjU . 


. D 1 1 1 D 






/ not e— 


MhiKiA repeat : matcnes i . . oz / or consensus 


repeat 


region 


Dill / . 


. D11DU 






/ note— 


" T 1 M7\ Q ronoaf • m 2 f pVi 0 c f\ Pi f\ A CI 1 fl /~\ "F poti C Dfi Cll Q " 
±j ±L W 1M. j i.t:|Jt:clU. I Lid Ltilclb OUOt. . OllU UL tUilocIloUo 


repeat 


region 


61152. 


. 61349 






/note= 


"L1MA10 repeat: matches 6140. .6320 of consensus 


repeat_ 


region 


61527. 


.61634 






/note= 


"MIR repeat: matches 51. .160 of consensus" 


repeat_ 


region 


61658. 


. 61711 






/note= 


"MER92B repeat: matches 4. .58 of consensus" 



BASE COUNT 19347 a 11930 c 11644 g 20404 t 
ORIGIN 



Query Match 8.0%; Score 35.6; DB 9; Length 63325; 

Best Local Similarity 58.5%; Pred. No. 7.6; 

Matches 62; Conservative 0; Mismatches 44; Indels 0; Gaps 0; 

Qy 220 atgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgctataa 27 9 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 
Db 11138 ATATAATGTAGATTAAGATATTAATGAATTTGTTTTCAGAATCTGTAATAGGGACCCTAA 11079 

Qy 280 gagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaatt 325 

I I I I I I I I II I I II II I I I I MM 

Db 1107 8 GAGCAGGGTGCTATCATTAC7W\CTCTTCCCTTTTTATTACTAATT 11033 



RESULT 13 

ZMA297901 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



sig_peptide 



CDS 



gene 



mat_peptide 



BASE COUNT 
ORIGIN 



ZMA297901 563 bp mRNA PLN ll-JAN-2001 

Zea mays mRNA for basal layer antifungal peptide (bap-la gene) . 
AJ297901 

AJ297901.1 GI:12214246 

bap-la gene; basal layer antifungal peptide. 
Zea mays . 
Zea mays 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae ; Zea. 

1 (bases 1 to 563) 

Serna Sanz,A. and Thompson, R. D . 

Maize endosperm secretes a novel antifungal protein into adjacent 

maternal tissue 

Unpublished 

2 (bases 1 to 563) 
Serna, A. 

Direct Submission 

Submitted ( 31-OCT-2000 ) Serna A., Plant Physiology, Max Planck 
Institut, Carl von Linne Weg 10, Cologne, 50829, GERMANY 

Location/Qualifiers 

1. .563 

/organism="Zea mays" 
/variety="A69Y" 
/db_xref="taxon: 4577" 
/t is sue_type=" endosperm" 
/dev_stage="7 days after pollination" 
40. .123 
/gene="bap-la" 
40. .321 
/gene="bap-la" 

/f unction="putative antifungal peptide" 
/codon_start=l 

/product="basal layer antifungal peptide" 
/protein_id="CAC21605. 1" 
/db_xref="GI : 12214247" 

/trans lation=" MAKFFNYTIIQGLLMLSMVLLASCAIHAHIISGETEEVSNTGSP 
TVMVTMGANRKIIEDNKNLLCYLRALEYCCARTRQCYDDIKKCLEHCRG" 
40. .321 
/gene="bap-la" 
124. .318 
/gene="bap-la" 
/product="bap-la protein" 
201 a 95 c 109 g 158T t 



Query Match 8.0%; 
Best Local Similarity 49.0%; 
Matches 127; Conservative 



Score 35.4; DB 8; 
Pred. No. 5.1; 
0; Mismatches 12 6; 



Length 563; 



Indels 



6; Gaps 



1; 



Qy 50 acctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaag 109 

I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 61 ACCATCATCCAAGGACTCTTGATGCTTTCCATGGTACTTCTGGCATCGTGCGCTATTCAT 120 

Qy 110 caagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgaca 169 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I III III III 
Db 121 GCACACATAATAAGTGGGGAAACTGAAGAGGTTAGCAACACAGGGAGCCCGACAGTGATG 180 

Qy 17 0 aagacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgctatcta 22 9 

I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 181 GTCACGATGGGGGCAAACCGAAAGATAATTGAAGATAATAAAAATTTATTGTGCTATCTA 24 0 

Qy 230 gattccggtcttaatgagtacgtctgcagaaagactaataagtgctataagagcttggtg 289 

I I MINI II I I I I I I I I I I I I I I II I 

Db 241 AGGGC TCTAGAGTACTGTTGTGCAAGGACCAGACAATGCTATGATGACATAAAG 2 94 

Qy 290 ctctgcgtggcgagttgtc 308 

I I I I I I I I I I I 
Db 2 95 AAATGCTTGGAGCATTGCC 313 



RESULT 14 

AL355521 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



AL355521 78874 bp ' DNA HTG 13-JUN-2001 

Homo sapiens chromosome X clone RP11-723E19, *** SEQUENCING IN 
PROGRESS ***, 22 unordered pieces. 
AL355521 

AL355521.4 GI:9863727 

HTG; HTGS_PHASE1; HTGS_CANCELLED . 

human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 78874) 
Mclay,K. 

Direct Submission 

Submitted (12-JUN-2001 ) Sanger Centre, Hinxton, Cambridgeshire, 
CB10 ISA, UK. E-mail enquiries: humquery@sanger.ac.uk Clone 
requests : doner equest@ Sanger .ac.uk 

On Aug 21, 2000 this sequence version replaced gi: 9231037. 

Genome Center 

Center: Sanger Centre 
Center code: SC 

Web site: http://www.sanger.ac.uk 
Contact: humquery@sanger.ac.uk 

Project Information 

Center project name: bA723E19 

Summary Statistics 

Assembly program: XGAP4; version 4.5 

Sequencing vector: plasmid; L08752; 100% of reads 

Chemistry: Dye-terminator ET-amersham; 100% of reads Consensus 

quality: 62626 bases at least Q40 

Consensus quality: 68358 bases at least Q30 

Consensus quality: 72218 bases at least Q20 

Insert size: 76774; sum-of-contigs 

Insert size: 183906; agarose-fp 

Quality coverage: 2.07x in Q20 bases; sum-of-contigs Quality 
coverage: 1.51x in Q20 bases; agarose-fp 



NOTE: This is a 'working draft 1 sequence. It currently 
consists of 22 contigs. The true ■ order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 2671: contig of 2671 bp in length 

2672 2771: gap of 100 bp 

2772 8329: contig of 5558 bp in length 

8330 8429: gap of 100 bp 

8430 11063: contig of 2634 bp in length 
11064 11163: gap of 100 bp 

11164 13181: contig of 2018 bp in length 
13182 13281: gap of 100 bp 

13282 16335: contig of 3054 bp in length 
16336 16435: gap of 100 bp 

16436 21088: contig of 4653 bp in length 
21089 21188: gap of 100 bp 

21189 23402: contig of 2214 bp in length 
23403 23502: gap of 100 bp 

23503 25575: contig of 2073 bp in length 
25576 25675: gap of 100 bp 

25676 32961: contig of 7286 bp in length 
32962 33061: gap of 100 bp 

33062 37451: contig of 4390 bp in length 
37452 37551: gap of 100 bp 

37552 40168: contig of 2617 bp in length 
40169 40268: gap of 100 bp 

40269' 44181: contig of 3913 bp in length 
44182 44281: gap of 100 bp 

44282 49401: contig of 5120 bp in length 
49402 49501: gap of 100 bp 

49502 55754: contig of 6253 bp in length 
55755 55854: gap of 100 bp 

55855 58174: contig of 2320 bp in length 
58175 58274: gap of 100 bp 

58275 60640: contig of 2366 bp in length 
60641 60740: gap of 100 bp 

60741 64375: contig of 3635 bp in length 
64376 64475: gap of 100 bp 

64476 67814: contig of 3339 bp in length 
67815 67914: gap of 100 bp 

67915 70067: contig of 2153 bp in length 
70068 70167: gap of 100 bp 

70168 72968: contig of 2801 bp in length 
72969 73068: gap of 100 bp 

73069 75340: contig of 2272 bp in length 
75341 75440: gap of 100 bp 

75441 78874: contig of 3434 bp in length. 
Location/Qualifiers 
1. .78874 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ ch r omo s ome = " X " 
/clone-"RPll-723E19" 



/clone_lib="RPCI-ll . 3" 
misc__f eature 1. .2671 

/note="assembly_f ragment : 007 56 

f ragment_chain : 1" 
misc_feature 2772. .8329 

/note="assembly_f ragment : 00239 

f ragment_chain: 1" 
misc_feature 8430. .11063 

/note="assembly_f ragment : 007 98 

f ragment_chain : 2" 
misc_feature 11164. .13181 

/note="assembly__f ragment : 00291 

f ragment_chain : 2 " 
misc_feature 13282. .16335 

/note="assembly_f ragment : 00029" 
misc_feature 16436. .21088 

/note="assembly_f ragment : 00042" 
misc_feature 21189. .23402 

/note="assembly_f ragment : 00139" 
misc_feature 23503. .25575 

/note="assembly_f ragment : 00155" 
misc_feature 25676. .32961 

/not e=" as sembly_f ragment : 00215" 
misc_feature 33062. .37451 

/note="assembly_f ragment : 00227 " 
misc_feature 37552. .40168 

/note-"assembly_f ragment : 00290" 
misc_feature 40269. .44181 

/not e=" as sembly_f ragment : 00333" 
misc_feature 44282. .49401 

/note="assembly_f ragment : 00387" 
misc_feature 49502. .55754 

/note="assembly_f ragment : 00398" 
misc_feature 55855. .58174 

/note="assembly_f ragment : 0054 9" 
misc_feature 58275. .60640 

/note="assembly_f ragment : 00556" 
• misc_feature 60741. .64375 

/note= "a ssembly_f ragment : 00581" 
misc_feature 64476. .67814 

/note="assembly_f ragment : 00589" 
misc_feature 67915. .70067 

/note="assembly_f ragment : 00718" 
misc_feature 70168. .72968 

/note="assembly_f ragment : 00889" 
misc_feature 73069. .75340 

/note="assembly_f ragment : 0094 9" 
misc_feature 75441. .78874 

/note="assembly_f ragment : 01032 " 
BASE COUNT 21338 a 16690 c 16975 g 21759 t 2112 others 
ORIGIN 



Query Match 8.0%; Score 35.4; DB 2; Length 78874; 

Best Local Similarity 50.3%; Pred. No. 9; 

Matches 87; Conservative 0; Mismatches 86; Indels 0; Gaps 



0; 



Qy 61 ataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaa 120 

II I I I I I I I I III I I I II I I I I I I I III I I I I I I 

Db 18521 ATGCCCTGTGTGCTTTGGAACACGTGCACAACCACACCTTGTTCATCACCATCCCAGAAA 18580 

Qy 121 aagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggc 180 

II I I I I I I I I I I I I I I I I I III III 

Db 18581 CCCTGACGCAGGCAAAGAGCAGAGTTATTAACCCTACTTTACTGATGTGGATACTGAGGC 18640 

Qy 181 atcgggcaacatacttgttagccgtaatgacgacgggccatgctatctagatt 233 

II I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 18641 CCAGAGGCTCATGCAAGTTATCAGTAAGTGGCAGGGACAGTTGCCTCTAGATT 18693 



RESULT 15 

AC064821 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



AC064821 174986 bp DNA HTG 07-JUL-2000 

Homo sapiens chromosome 12 clone RP11-125G9, WORKING DRAFT 
SEQUENCE, 9 unordered pieces. 
AC064821 

AC064821.2 GI:7770020 

HTG; HTGS_PHASE1; HTGS_DRAFT . 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 174986) 
Waterston, R.H. 

The sequence of Homo sapiens clone 
Unpublished 

2 (bases* 1 to 174986) 
Waterston, R.H. 
Direct Submission 

Submitted (22-APR-2000) Genome - Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

On May 11, 2000 this sequence version replaced gi:7637335. 



Genome Center 

Center: Washington University Genome Sequencing Center 
Center code: WUGSC 

Web site : http: //genome . wustl . edu/gsc/index . shtml 

Project Information 

Center project name: H_NH0125G09 

Summary Statistics 

Sequencing vector: M13; 100% 
Sequencing vector: plasmid; 0% 
Chemistry: Dye-primer ET; 100% of reads 
Chemistry: Dye-terminator Big Dye; 0% of reads 
Assembly program: Phrap; version 0.990319 
Consensus quality: 170476 bases at least Q40 
Consensus quality: 171216 bases at least Q30 
Consensus quality: 171956 bases at least Q20 
Insert size: 189000; agarose-fp 
Insert size: 174186; sum-of -contigs 
Quality coverage: 5.78 in Q20 bases; agarose-fp 
Quality coverage: 6.32 in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft' sequence. It currently 
consists of 9 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

of 2309 bp in length 



FEATURES 

source 





1 


o o c\ o . 

2309 : 


contig ■ 




2310 


o a n o . 

24 09: 


gap of 




2410 


0/42: 


contig 




5743 


5842: 


gap of 




5843 


12706: 


contig 


* 


12707 


12806: 


gap of 


* 


12807 


23411: 


contig 




23412 


23511: 


gap of 


* 


23512 


38025: 


contig 




38026 


38125: 


gap of 




38126 


55106: 


contig 




55107 


55206: 


gap of 


* 


55207 


85998: 


contig 


* 


85999 


86098: 


gap of 




86099 


124167: 


contig 


* 


124168 


124267: 


gap of 


* 


124268 


174986: 


contig 




Location /Qualifiers 




1 . 


. 174986 





m 



in 



length 
length 
length 
length 
length 
length . 



misc feature 



misc feature 



raise feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



BASE COUNT 
ORIGIN 



/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/ c h r omo s ome ="12" 
/clone="RPll-125G9" 
1. .2309 

/note="assembly_name :Contig2" 
2410. .5742 

/note="assembly_name : Contig3" 
5843. .12706 

/note="assembly__name : Contig4 " 
12807. .23411 

/note="assembly_name : Contig5" 
23512. .38025 

/note="assembly_name : Contig 6" 
38126. .55106 

/note="assembly_name :Contig7 
clone_end:T7 
vector_side : right" 
55207. .85998 

/note="assembly_name : Contig8" 
86099. .124167 

/note="assembly_name :Contig9" 
124268. .174986 
/note="assembly_name : ContiglO 
clone_end: SP6 
vector^side : right" 
55960 a 31805 c 33275 g 53145 t 



801 others 



Query Match 8.0%; Score 35.4; DB 2; Length 174986; 

Best Local Similarity 51.6%; Pred. No. 9.8; 

Matches 81; Conservative 0; Mismatches 76; Indels 0; Gaps 



0; 



Qy 207 atgacgacgggccatgctatctagattccggtcttaatgagtacgtctgcagaaagacta 266 

I I I I I I I I I I I I I I I I Mill I I I I I I I I I I I I I 
Db 70184 ATAAAGACGGAGGATGATATTTGTTCTAGAATCTTGAAAAGAAGGAATGCAAAAAGAGGC 70243 

Qy 2 67 ataagtgctataagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattc 32 6 

I II I III I I I I I I I I I I I I I I I I I I I I I I 

Db 7 0244 AAAAG AAGT AAAT T AAGAT TAT GT TC AGC G AAGT CAT T T GT CT GG AAAT T T AT GT AAC T C 70303 

Qy 327 aagatactgcggagacatcatgatactgcggagacag 363 

II II II I I I I I I I I I I I I 

Db 7 0304 AAATAATTAGGGCAAACCTGTGATGTTTGGGATAAAG 7034 0 



Search completed: February 7, 2002, 10:57:42 
Job time: 9388 sec 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on : 



February 7, 2002, 10:59:37 ; Search time 428.31 Seconds 

(without alignments) 
888.731 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-394-745-6154 
444 

1 cgaaaacactggtacccaaa tcccattttaagaaataaat 444 



Scoring table: IDENTITY_NUC 

Gapop 10.0 , Gapext 1.6 



Searched: 



930621 seqs, 428662619 residues 



Total number of hits satisfying chosen parameters: 



1861242 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database 



N_Geneseq_1101 : * 

1: /SIDS2/gcgdata/geneseq/gene 

2: /SIDS2/gcgdata/geneseq/gene 

3: /SIDS2/gcgdata/geneseq/gene 

4: /SIDS2 /gcgdata/geneseq/gene 

5: /SIDS2/gcgdata/geneseq/gene 

6: /SIDS2/gcgdata/geneseq/gene 

7: /SIDS2/gcgdata/geneseq/gene 



seqn/NA1980 . DAT : * 
seqn/NA1981.DAT:* 
seqn/NA1982 . DAT: * 
seqn/NAl 983 . DAT : * 
seqn/NA1984 .DAT: * 
seqn/NAl 985. DAT: * 
seqn/NAl 98 6 . DAT : * 



9: 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 



/SIDS2/gcgdata/geneseq/geneseqn/NA1987 . DAT : * 
/SIDS2/gcgdata/geneseq/geneseqn/NA1988 . DAT : * 
/SIDS2/gcgdata/geneseq/geneseqn/NA1989 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1990 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1991 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1992 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1993 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1994 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1995 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1996 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1997 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1998 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA1999 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA2000 . DAT : 
/SIDS2/gcgdata/geneseq/geneseqn/NA2001 . DAT: 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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Query 
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22 
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22 
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ALIGNMENTS 



RESULT 1 
AAX90965 

ID AAX90965 standard; cDNA; 379 BP. 
XX 

AC AAX90965; 
XX 

DT 17-JAN-2000 (first entry) 
XX 

DE Maize basal endosperm transfer cell layer-2 cDNA. 
XX 

KW Maize basal endosperm transfer cell layer-2 specific protein; BETL-2; 

KW defensin supergene family; antimicrobial peptide; endosperm; promoter; 

KW grain development; regulatory element; transgenic plant; 

KW protein expression; BETL-specif ic expression; heterologous DNA; 

KW solute partitioning; disease resistance; endosperm-derived product; 

KW cotton quality; aromatic oil; ss . 

XX 

OS Zea mays. 
XX 

FH Key Location/Qualifiers 

FT CDS 44.. 328 

FT /*tag= a 

FT /product^ "Basal endosperm transfer cell layer-2 protein" 
XX 

PN WO9950427-A2 . 
XX 

PD 07-OCT-1999. 
XX 

PF 26-MAR-1999; 99WO-EP02063 . 
XX 

PR 27-MAR-1998; 98EP-0105628 . 
XX 

PA (PLAC ) MAX PLANCK GES FOERDERUNG WISSENSCHAFTEN . 
XX 

PI Thompson RD, Yan G, Salamini F, Hueros G; 
XX 

DR WPI; 1999-610858/52. 

DR P-PSDB; AAY28847. 
XX 



PT New nucleic acid encoding three basal endosperm transfer cell layer 

PT proteins, used to produce transgenic plants with e.g. increased disease 

PT resistance and to identify specific modulators 

XX 

PS Claim 1; Page 64-65; 76pp; English. 
XX 

CC The present sequence encodes for the maize basal endosperm transfer cell 

CC layer-2 specific protein. This has homology to defensin supergene family 

CC of antimicrobial peptides. The basal region of endosperm is highly 

CC specialised to facilitate uptake of solutes during grain development. 

CC Vectors comprising this nucleic acid sequence operably linked to 

CC regulatory elements is used to produce transgenic plants. These plants 

CC have altered levels of BETL protein expression. The regulatory region of 

CC the promoter is used to provide BETL-specific expression of heterologous 

CC DNA; to modify solute partitioning in the endosperm; for disease 

CC resistance; to improve endosperm-derived products and to express enzymes 

CC that modify quality of cotton or aromatic oils. 

XX 

SQ Sequence 379 BP; 107 A; 85 C; 94 G; 93 T; 0 other; 



Query Match 64.0%; Score 284; DB 20; Length 379; 

Best Local Similarity 87.5%; Pred. No. 1.7e-84; 

Matches 322; Conservative 0; Mismatches 45; Indels 1; Gaps 1; 



Qy 


3 


aaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaat 

II III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 II II 1 Mill II 1 
agactattgtagctcatatcatctgtcacccatggcgaagtgcagcagcttccaaggatt 


62 


Db 


13 


72 


Qy 


63 


aatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaa 

1 II II III 1 1 1 1 1 1 1 II II 1 1 II III III II III 1 1 1 1 1 1 II 
attctggttgctttccatgattcttctagcatcctttgttgctcatgcacg-cacaacaa 


122 


Db 


73 


131 


Qy 
Db 


123 gtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcat 

i 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
132 gtgggcaaaccaaagaggacagcaatgctaggaacatgacgatgaccaagacgagggcat 


182 
191 


Qy 
Db 


183 
192 


cgggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctta 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 II 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
caggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctta 


242 
251 


Qy 

Db 


243 
252 


atgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcga 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 
atgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcga 


302 
311 


Qy 


303 


gt t gt caacca t catcatgaattcaagat act gcggagaca teat gat act geggagaca 
1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 II 1 1 1 
gt t gt caacca teat cat gaat teat gat act gcggagaca teat gat act gcggagaca 


362 


Db 


312 


371 


Qy 


363 


gaeggeca 370 
1 1 1 1 1 1 1 




Db 


372 


gaeggega 379 





RESULT 2 
AAV17563 

ID AAV17563 standard; cDNA; 1920 BP. 



XX 

AC AAV17563; 
XX 

DT 10-JUN-1998 (first entry) 
XX 

DE Coding sequence for the alpha 1 subunit of beta-conglycinin . 
XX 

KW Beta-conglycinin; soybean seed protein; transgenic plant; 

KW seed storage protein profile; ss. 

XX 

OS Glycine max. 
XX 

PN W09747731-A2 . 
XX 

PD 18-DEC-1997. 
XX 

PF 10-JUN-1997; 97WO-US09743 . 
XX 

PR 14-JUN-1996; 96US-0019940 . 
XX 

PA (DUPO ) DU PONT DE NEMOURS & CO E I . 
XX 

PI Fader GM, Kinney AJ; 
XX 

DR WPI; 1998-052298/05. 
XX 

PT Suppression of specific classes of soybean seed protein genes - 

PT useful to change seed storage protein profiles of transgenic plants 

XX 

PS Disclosure; Page 30-31; 58pp; English. 
XX 

CC This sequence represents the coding sequence for the alpha' subunit of 

CC the soybean seed protein beta-conglycinin. The method of the invention is 

CC for reducing the quantity of a soybean seed storage protein (A) , such as 

CC beta-conglycinin, in soybeans. It comprises: (a) constructing a chimeric 

CC gene comprising: (i) a nucleic acid fragment encoding a promoter that is 

CC functional in the cells of soybean seeds; (ii) a nucleic acid fragment 

CC encoding all or a portion of (A) placed in sense or antisense orientation 

CC relative to the promoter of (i); and (iii) a transcriptional termination 

CC region; (b) creating a transgenic soybean cell by introducing into a 

CC soybean cell the chimeric gene of (a) ; and (c) growing the transgenic 

CC soybean cells of (b) under conditions that result in expression of the 

CC chimeric gene of (a) ; where the quantity of one or more members of a 

CC class of (A) subunits is reduced when compared to soybeans not containing 

CC the chimeric gene of (a) . The method is used to construct transgenic 

CC soybean lines where the expression of genes encoding (A) are modulated to 

CC effect a change in seed storage protein profile of transgenic plants. 

CC Modification of the seed storage protein profile can result in the 

CC production of novel soy protein products with unique and valuable 

CC functional characteristics. 
XX 

SQ Sequence 1920 BP; 634 A; 444 C; 449 G; 393 T; 0 other; 



Query Match 7.9%; Score 35.2; DB 19; Length 1920; 

Best Local Similarity 62.5%; Pred. No. 0.18; 

Matches 55; Conservative 0; Mismatches 33; Indels 0; Gaps 0; 



Qy 106 caagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgat 165 

I I I I I M I I I I II II I II I I I I I I I I I I I I I I I I I I 

Db 478 cacgaatggcaacacaagcaggaaaagcaccaaggaaaggaaagtgaagaagaagaagaa 537 

Qy 166 gacaaagacgagggcatcgggcaacata 193 

II I I I I I I I I I I I I I I I I I 

Db 538 gaccaagacgaggatgaggagcaagaca 565 



RESULT 3 
AAQ25079/C 

ID AAQ25079 standard; DNA; 1404 BP. 
XX 

AC AAQ2507 9; 
XX 

DT 17-NOV-1992 (first entry) 
XX 

DE Alpha-amylase variant encoding leucine at position 84. 
XX 

KW Mutant; maltose; malto-oligosaccharides ; Saccharomycopsis; fibuligera; 

KW polymerisation; DP; transglycosif ier ; ss. 

XX 

OS Saccharomycopsis fibuligera. 
XX 

FH Key Location/Qualifiers 

FT mutation 329 

FT /*tag= a 

FT /note= "mutated to thymine" 
XX 

PN JP04108386-A. 
XX 

PD 09-APR-1992. 
XX 

PF 28-AUG-1990; 90 JP-022 6112 . 
XX 

PR 28-AUG-1990; 90 JP-022 6112 . 
XX 

PA (AGEN ) AGENCY OF IND SCI &. 
XX 

DR WPI; 1992-171652/21. 

DR P-PSDB; AAR24136. 
XX 

PT Variant alpha-amylase gene for mfr. of malto-oligosaccharide ( s ) - 

PT is obtd. by mutating the nucleotide at position 329 of the 

PT Saccharomycopsis fibuligera wild-type sequence to thymine. 
XX 

PS Claim 1; Fig 1; lOpp; Japanese. 
XX 

CC The variant alpha amylase gene was obtd. by mutating the 329th 

CC nucleotide of the alpha amylase gene of Saccharomycopsis fibuligera 

CC to T. This mutation results in substitution of the wild-type amino 

CC acid at position 84 of alpha-amylase by leucine. The variant alpha- 

CC amylase is high in transglycosif ying activity. The variant alpha- 

CC amylase may be used to prepare malto-oligosaccharides with a degree 

CC of polymerisation (DP) of at least 7, by inversion of the malto- 

CC oligosaccharide . 



XX 

SQ Sequence 1404 BP; 407 A; 271 C; 294 G; 432 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 13; Length 1404 ; 

Best Local Similarity 49.7%; Pred. No. 0.33; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCT^ACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACCTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 4 
AAQ77667/C 

ID AAQ77667 standard; DNA; 1404 BP. 
XX 

AC AAQ77 667; 
XX 

DT 16-JUN-1995 (first entry) 
XX 

DE Variant alpha amylase deriv. from Saccharomyopsis fibuligera. 
XX 

KW alpha amylase; carbohydrate hydrolase; increased activity; 

KW tyrosine residue; enzyme centre; mass production; oligosaccharide; 

KW variant; cyclomaltodextrin glucanotransf erase; ds . 

XX 

OS Saccharomycopsis fibuligera. 
XX 

FH Key Location/Qualifiers 
FT misc_dif ference 247.. 249 
FT /*tag= a 

FT /note= "the wild type sequence TAY was mutated to 

FT CTC to give a variant enzyme" 

XX 

PN JP06253836-A. 
XX 

PD 13-SEP-1994. 
XX 

PF 04-MAR-1993; 93 JP-0069303 . 
XX 

PR 04-MAR-1993; 93 JP-0069303 . 
XX 

PA (AGEN ) AGENCY OF IND SCI & TECHNOLOGY. 
XX 

DR WPI; 1994-328987/41. 
DR P-PSDB; AAR63186. 
XX 

PT Variant carbohydrate hydrolase (s) with increased activity - 



PT consists of e.g. alpha-amylase with tyrosine residue in enzyme 

PT centre, useful for mass-prodn. of oligosaccharide (s ) 

XX 

PS Example 1; Page 18-20; 27pp; Japanese. 
XX 

CC AAQ77665-8 encode variant alpha amylases, composed by substituting 

CC bases 247-249 of the structural gene region, with TTC, TGG, CTC or 

CC AAC . These substitutions result in the 83rd amino acid residue 

CC (tyrosine) of the wild type sequence being changed to phenylalanine, 

CC tryptophan, leucine or asparagine respectively. The substituted 

CC amino acid is present in the active site of the enzyme and confers 

CC increased activity on the enzyme. The variants are useful for the 

CC mass production of oligosaccharides, (see AAQ77669 for the variant 

CC structure of a cyclomaltodextrin glucanotransf erase) . 
XX 

SQ Sequence 1404 BP; 406 A; 273 C; 293 G; 432 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 15; Length 1404; 



Best Local Similarity 49.7%; Pred. No. 0.33; 



Matches 


87; Conservative 0; Mismatches 88; Indels 0; Gaps 


Qy 


51 


cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 
CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 


110 


Db 


689 


630 


Qy 


111 


aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 

1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 
TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 


170 


Db 


629 


570 


Qy 


171 


agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 
GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 




Db 


569 





RESULT 5 
AAQ77668/c 

ID AAQ77668 standard; DNA; 1404 BP. 
XX 

AC AAQ7 7 668; 
XX 

DT 16-JUN-1995 (first entry) 
XX 

DE Variant alpha amylase deriv. from Saccharomyopsis fibuligera. 
XX 

KW alpha amylase; carbohydrate hydrolase; increased activity; 

KW tyrosine residue; enzyme centre; mass production; oligosaccharide; 

KW variant; cyclomaltodextrin glucanotransf erase ; ds . 

XX 

OS Saccharomycopsis fibuligera. 
XX 

FH Key Location/Qualifiers 
FT misc_dif ference 247.. 249 
FT /*tag= a 

FT /note= "the wild type sequence TAY was mutated to 

FT AAC to give a variant enzyme" 

XX 



PN JP06253836-A. 
XX 

PD 13-SEP-1994. 
XX 

PF 04-MAR-1993; 93 JP-00 69303 . 
XX 

PR 04-MAR-1993; 93 JP-00 69303 . 
XX 

PA (AGEN ) AGENCY OF IND SCI & TECHNOLOGY. 
XX 

DR WPI; 1994-328987/41. 

DR P-PSDB; AAR63187. 
XX 

PT Variant carbohydrate hydrolase (s) with increased activity - 

PT consists of e.g. alpha-amylase with tyrosine residue in enzyme 

PT centre, useful for mass-prodn. of oligosaccharide ( s ) 
XX 

PS Example 1; Page 20-23; 27pp; Japanese. 
XX 

CC AAQ77 665-8 encode variant alpha amylases, composed by substituting 

CC bases 247-249 of the structural gene region, with TTC, TGG, CTC or 

CC AAC. These substitutions result in the 83rd amino acid residue 

CC (tyrosine) of the wild type sequence being changed to phenylalanine, 

CC tryptophan, leucine or asparagine respectively. The substituted 

CC amino acid is present in the active site of the enzyme and confers 

CC increased activity on the enzyme. The variants are useful for the 

CC mass production of oligosaccharides, (see AAQ77669 for the variant 

CC structure of a cyclomaltodextrin glucanotransf erase) . 
XX 

SQ Sequence 1404 BP; 408 A; 272 C; 293 G; 431 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 15; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.33; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I II I I III I III I I I I I I I II I I I I I I I I I I I I 
Db 689 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I II I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATC7VATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II II I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 6 
AAQ77665/C 

ID AAQ77665 standard; DNA; 1404 BP. 
XX 

AC AAQ77665; 
XX 

DT 16-JUN-1995 (first entry) 
XX 



DE Variant alpha amylase deriv. from Saccharomyopsis fibuligera. 
XX 

KW alpha amylase; carbohydrate hydrolase; increased activity; 

KW tyrosine residue; enzyme centre; mass production; oligosaccharide; 

KW variant; cyclomaltodextrin glucanotransf erase ; ds . 

XX 

OS Saccharomycopsis fibuligera. 
XX 

FH Key Location/Qualifiers 

FT misc_difference 247.. 249 

FT /*tag= a 

FT /note= "the wild type sequence TAY was mutated to 

FT TTC to give a variant enzyme" 

XX 

PN JP06253836-A. 
XX 

PD 13-SEP-1994. 
XX 

PF 04-MAR-1993; 93 JP-0069303 . 
XX 

PR 04-MAR-1993; 93 JP-0069303 . 
XX 

PA (AGEN ) AGENCY OF IND SCI & TECHNOLOGY. 
XX 

DR WPI; 1994-328987/41. 

DR P-PSDB; AAR63184. 
XX 

PT Variant carbohydrate hydrolase (s) with increased activity - 

PT consists of e.g. alpha-amylase with tyrosine residue in enzyme 

PT centre, useful for mass-prodn. of oligosaccharide { s ) 
XX 

PS Example 1; Page 13-15; 27pp; Japanese. 
XX 

CC AAQ77665-8 encode variant alpha amylases, composed by substituting 

CC bases 247-249 of the structural gene region, with TTC, TGG, CTC or 

CC AAC . These substitutions result in the 83rd amino acid residue 

CC (tyrosine) of the wild type sequence being changed to phenylalanine, 

CC tryptophan, leucine or asparagine respectively. The substituted 

CC amino acid is present in the active site of the enzyme and confers 

CC increased activity on the enzyme. The variants are useful for the 

CC mass production of oligosaccharides, (see AAQ77669 for the variant 

CC structure of a cyclomaltodextrin glucanotrans f erase ) . 
XX 

SQ Sequence 1404 BP; 406 A; 272 C; 293 G; 433 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 15; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.33; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 689 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacg;atgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 T T AGC ACT AT C AAT T C T T AAAC CAT C AAT T G AGT AAT T GCC AAC AAAAT CT T T AAC C C AA 570 



Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I I I I I I I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 7 
AAQ77666/C 

ID AAQ77666 standard; DNA; 1404 BP. 
XX 

AC AAQ77 666; 
XX 

DT 16-JUN-1995 (first entry) 
XX 

DE Variant alpha amylase deriv. from Saccharomyopsis fibuligera. 
XX 

KW alpha amylase; carbohydrate hydrolase; increased activity; 

KW tyrosine residue; enzyme centre; mass production; oligosaccharide; 

KW variant; cyclomaltodextrin glucanotransf erase ; ds . 

XX 

OS Saccharomycopsis fibuligera. 
XX 

FH Key Location/Qualifiers 

FT misc_dif ference 247.. 249 

FT /*tag= a 

FT /note= "the wild type sequence TAY was mutated to 

FT TGG to give a variant enzyme" 

XX 

PN JP06253836-A. 
XX 

PD 13-SEP-1994. 
XX 

PF 04-MAR-1993; 93 JP-00 69303 . 
XX 

PR 04-MAR-1993; 93 JP-00 69303 . 
XX 

PA (AGEN ) AGENCY OF IND SCI & TECHNOLOGY. 
XX 

DR WPI; 1994-328987/41. 

DR P-PSDB; AAR63185. 
XX 

PT Variant carbohydrate hydrolase (s) with increased activity - 

PT consists of e.g. alpha-amylase with tyrosine residue in enzyme 

PT centre, useful for mass-prodn. of oligosaccharide ( s ) 
XX 

PS Example 1; Page 15-17; 27pp; Japanese. 
XX 

CC AAQ77665-8 encode variant alpha amylases, composed by substituting 

CC bases 247-249 of the structural gene region, with TTC, TGG, CTC or 

CC AAC. These substitutions result in the 83rd amino acid residue 

CC (tyrosine) of the wild type sequence being changed to phenylalanine, 

CC tryptophan, leucine or asparagine respectively. The substituted 

CC amino acid is present in the active site of the enzyme and confers 

CC increased activity on the enzyme. The variants are useful for the 

CC mass production of oligosaccharides, (see AAQ77669 for the variant 

CC structure of a cyclomaltodextrin glucanotransf erase) . 
XX 



SQ Sequence 1404 BP; 406 A; 271 C; 295 G; 432 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 15; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.33; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I ' I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGA7VAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 8 
AAN70916/c 

ID AAN70916 standard; DNA; 4214 BP. 
XX 

AC AAN7 0916; 
XX 

DT 03-MAY-1991 (first entry) 
XX 

DE Sequence encoding alpha-amylase from plasmid pSf alpha 1. 
XX 

KW Amylase; ds . 
XX 

OS Saccharomyces fibuligera HUT7212. 
XX 

FH Key Location/Qualifiers 
FT CDS 1531.. 3015 

FT /*tag= a 

XX 

PN JP62104576-A. 
XX 

PD 15-MAY-1987. 
XX 

PF 31-OCT-1985; 85JP-0244892 . 
XX 

PR 31-OCT-1985; 85JP-0244892 . 
XX 

PA (FUKU/) FUKUI S. 
XX 

DR WPI; 1987-173694/25. 
DR P-PSDB; AAP70571. 
XX 

PT Amylase prodn . - comprises culturing microorganism transformed 
PT with vector deoxyribonucleic acid, accumulating and collecting 
PT amylase 
XX 

PS Disclosure; Fig 1; 14pp; Japanese. 
XX 



CC The plasmid may be used to transform an E.coli expression system for 

CC the stable production of amylase, useful in ethanol fermentation. 

CC See also AAN70917. 
XX 

SQ Sequence 4214 BP; 1249 A; 784 C; 860 G; 1321 T; 0 other; 



Query Match 7.7%; Score 34.2; DB 8; Length 4214; 

Best Local Similarity 49.7%; Pred. No. 0.55; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I 1! I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 22 97 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 2238 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

IN I II II I I I I I * I I I I I II I II I III 
Db 2237 T T AGC AC T AT C AAT T C T T AAACC AT C AAT T GAGT AAT T GC C AAC AAAAT C T T T AAC C C AA 2178 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 2177 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 2123 



RESULT 9 




AAF2! 


3548 




ID 


AAF28548 standard; DNA; 96109 BP. 




XX 






AC 


AAF28548; 




XX 






DT 


04-APR-2001 (first entry) 




XX 






DE 


Genomic fragment #35. 




XX 






KW 


Genomic library; bacteria; human upper airway; 


otitis media; sinusitis 


KW 


bronchopulmonary; endocarditis; meningitis; ss. 




XX 






OS 


Moraxella catarrhalis. 




XX 






PN 


WO200078968-A2. 




XX 






PD 


28-DEC-2000 . 




XX 






PF 


16-JUN-2000; 2000WO-US1664 9 . 




XX 






PR 


18-JUN-1999; 99US-014 0121 . 




XX 






PA 


(INCY-) INCYTE GENOMICS INC. 




XX 






PI 


Lagace RE, Patterson C, Berg KL; 




XX 






DR 


WPI; 2001-041427/05. 




XX 






PT 


Genomic library for identifying diagnostic and 


therapeutic 


PT 


compositions, and for identifying virulence factors, regulatory 


PT 


elements and drug targets, comprises Moraxella 


catarrhalis nucleic 


PT 


acids - 





XX 

PS Claim 1; Page 345-368; 545pp; English. 
XX 

CC The present invention relates to a Moraxella catarrhalis genomic library 

CC comprising of a combination of 41 nucleic acid molecules (see 

CC AAF28514-AAF28554 ) . The library has a number of uses described in the 

CC specification e.g. is useful for aidentifying diagnostic and therapeutic 

CC compositions. M. catarrhalis (Branhamella catarrhalis) is a large 

CC aerobic, gram-negative diplococcus, normally found among the bacterial 

CC flora of human upper airways. M. catarrhalis is known to cause acute, 

CC localised infections such as otitis media, sinusitis and bronchopulmonary 

CC infection and life-threatening, systemic diseases including endocarditis 

CC and meningitis. 

XX 

SQ Sequence 96109 BP; 28783 A; 18910 C; 20341 G; 28075 T; 0 other; 



Query Match 7.6%; Score 33.8; DB 22; Length 96109; 

Best Local Similarity 53.4%; Pred. No. 3.2; 

Matches 71; Conservative 0; Mismatches 62; Indels 0; Gaps 0; 

Qy 2 gaaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaa 61 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 III III 
Db 80672 gaaatcgctaatgcccatcccaaacctgacccaataccatatacaatagactcaccaaaa 80731 

Qy 62 taatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaa 121 

III II I I I I I I I I I I II I I I I I II I II 

Db 807 32 ttataatcacgctcaaccataaacaatacaccacccatgatggcacagttcactgtaatc 807 91 

Qy 122 agtgggcaaacca 134 

I I I I I I'l I 
Db 80792 aagggtaaaaaca 80804 



RESULT 10 
AAC46213/C 

ID AAC46213 standard; DNA; 1120 BP. 
XX 

AC AAC4 6213; 
XX 

DT 18-OCT-2000 (first entry) 
XX 

DE Arabidopsis thaliana DNA fragment SEQ ID NO: 49314. 
XX 

KW Hybridisation assay; genetic mapping; gene expression control; 

KW protein identification; signal transduction pathway; 

KW metabolic pathway; promoter; termination sequence; ss. 
XX 

OS Arabidopsis thaliana. 
XX 

PN EP1033405-A2. 
XX 

PD 06-SEP-2000. 
XX 

PF. 25-FEB-2000; 2000EP-0301439 . 
XX 

PR 25-FEB-1999; 99US-0121825 . 
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Query Match 7.5%; Score 33.4; DB 21; Length 1120; 

Best Local Similarity 52.2%; Pred. No. 0.55; 

Matches 70; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 305 tgtcaaccat cat catgaattcaagatactgcggagaca teat gat actgcggagacaga 364 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 7 67 TCTTCATCATCGTCATAATATTCATCACCTGCGCGGACAGAGTGAAACTCAGGAGGTGGA 708 

Qy 365 cggccagagatgangctagctagatgccgtttcaccannatattatgtaacacccaaatc 424 

II I III II I I I I I II I I III I I I I I I 

Db 707 GGACAGTCTTCGGTTATGGCCTCATCACGTTTCATCACAAGATGCCTAATCACCCTCTCA 64 8 

Qy 425 tcccattttaagaa 438 

III I I I I I I 

Db 64 7 TCACTATCTAACAA 634 



RESULT 11 
AAC35071/c 



ID 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
XX 
OS 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 



AAC35071 standard; DNA; 1123 BP. 
AAC35071; 

17-OCT-2000 {first entry) 

Arabidopsis thaliana DNA fragment SEQ ID NO: 8891. 

Hybridisation assay; genetic mapping; gene expression control; 
protein identification; signal transduction pathway; 
metabolic pathway; promoter; termination sequence; ss. 

Arabidopsis thaliana. 

EP1033405-A2 . 

06-SEP-2000. 

25-FEB-2000; 2000EP-03014 39 . 



25-FEB-1999 

05- MAR-1999 
09-MAR-1999 
2 3-MAR-1999 
25-MAR-1999 
29-MAR-1999 
01-APR-1999 

06- APR-1999 
08-APR-1999 



99US-0121825. 
99US-0123180. 
99US-0123548 . 
99US-0125788. 
99US-0126264 . 
99US-0126785. 
99US-0127462. 
99US-0128234. 
99US-0128714. 
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99US-0149723 
99US-0149929 
99US-0149902 
99US-0149930 
99US-0150566 
99US-0150884 
99US-0151065 
99US-0151066 
99US-0151080 
99US-0151303 
99US-0151438 
99US-0151930 
99US-0152363 
99US-0153070 
99US-0-153758 
99US-0154018 
99US-0154039 
99US-0154779 
99US-0155139 
99US-0155486 
99US-0155659 
99US-0156458 
99US-0156596 
99US-0157117 
99US-0157753 
99US-0157865 
99US-0158029 
99US-0158232 
99US-0158369 
99US-0159293 
99US-0159294 
99US-0159295 
99US-0159329 
99US-0159330 
99US-0159331 
99US-0159637 
99US-0159638 
99US-0159584 
99US-0160741 
99US-0160767 
99US-0160768 
99US-0160770 
99US-0160814 
99US-0160815 
99US-0160980 
99US-0160981 
99US-0160989 
99US-0161404 
99US-0161405 
99US-0161406 
99US-0161359 
99US-0161360 
99US-0161361 
99US-0161920 
99US-0161992 
99US-0161993 
99US-0162142 



Query Match 7.5%; Score 33.4; DB 21; Length 1123; 

Best Local Similarity 52.2%; Pred. No. 0.55; 

Matches 70; Conservative 0; Mismatches 64; Indels 0; Gaps 

Qy 305 tgtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagacaga 364 

II I I I I I I M I I I I I I I I I III! I I I I I I MM II 

Db 770 TCTTCATCATCGTCATAATATTCATCACCTGCGCGGACAGAGTGAAACTCAGGAGGTGGA 711 

Qy 365 cggccagagatgangctagctagatgccgtttcaccannatattatgtaacacccaaatc 424 

II I III II I I I II M I I III I II I II 

Db 710 GGACAGTCTTCGGTTATGGCCTCATCACGTTTCATCACAAGATGCCTAATCACCCTCTCA 651 

Qy 425 tcccattttaagaa 438 

III I I I I I I 

Db 650 TCACTATCTAACAA 637 



RESULT 12 
AAI24374/C 

ID AAI24374 standard; DNA; 273 BP. 
XX 

AC AAI24374; 
XX 

DT 12-OCT-2001 (first entry) 
XX 

DE Probe #14307 for gene expression analysis in human cervical cell sample 
XX 

KW Probe; human; microarray; gene expression; cervical epithelial cell; 

KW cervical cancer; ss. 

XX 

OS Homo sapiens . 
XX 

PN WO200157278-A2. 
XX 

PD 09-AUG-2001. 
XX 

PF 30-JAN-2001; 2001WO-US00 670 . 
XX 

PR 04-FEB-2000; 2000US-0180312 . 

PR 26-MAY-2000; 2000US-0207 4 56 . 

PR 30-JUN-2000; 2000US-0 6084 08 . 

PR 03-AUG-2000; 2000US-0 632366 .' 

PR 21-SEP-2000; 2000US-0234 687 . 

PR 27-SEP-2000; 2000US-0236359 . 

PR 04-OCT-2000; 2000GB-0024 2 63 . 
XX 

PA (MOLE- ) MOLECULAR DYNAMICS INC. 
XX 

PI Penn SG, Hanzel DK, Chen W, Rank DR; 

XX % 
DR . WPI; 2001-488901/53. 
XX 

PT Human genome-derived single exon nucleic acid probes useful for 

PT analyzing gene expression in human cervical epithelial cells - 
XX 

PS Claim 25; SEQ ID No 14307; 487pp; English. 



XX 

CC The present invention relates to human single exon nucleic acid probes 

CC (SENP) . The present sequence is one such probe. The SENPs are derived 

CC from human HeLa cells. The SENPs can be used to produce a single exon 

CC microarray, which can be used for measuring human gene expression in a 

CC sample derived from human cervical epithelial cells. By measuring gene 

CC expression, the probes are therefore useful in grading and/or staging 

CC of diseases of the cervix, notably cervical cancer. 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp . wipo . int /pub/published_pct_sequences . 
XX 

SQ Sequence 273 BP; 65 A; 54 C; 71 G; 83 T; 0 other; 



Query Match 7.3%; Score 32.4; DB 22; Length 273; 

Best Local Similarity 56.6%; Pred. No. 0.62; 

Matches 60; Conservative 0; Mismatches 46; Indels 0; Gaps 0; 

Qy 71 tgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaa 130 

M I I I M I I I I I I I I I I II I I I I I I I I I I I I 

Db 17 3 TGCCTTCCCCACCTCCCCCTCTTTCCCCTCCTCCTACTACCAGCCCCCATAAACAGACAG 114 

Qy 131 accaaagaggacagcaatgctaggaaaatgacgatgacaaagacga 17 6 

I I I I I I I I I II I I I I I I I I I II I I I I III 
Db 113 AAGACAAAGGAGTTCAATGTGAGGAAGAGGAAGAAGAGAAGAAAGA 68 



RESULT 13 
AAI09913/C 

ID AAI09913 standard; DNA; 273 BP. 
XX 

AC AAI09913; 
XX 

DT 09-OCT-2001 (first entry) 
XX 

DE Probe #9904 used to measure gene expression in human breast sample. 
XX 

KW Probe; human; breast disease; breast cancer; development disorder; ss; 

KW inflammatory disease; proliferative breast disease; non-carcinoma tumour. 
XX 

OS Homo sapiens. 
XX 

PN WO200157270-A2. 
XX 

PD 09-AUG-2001. 
XX 

PF 29-JAN-2001; 2001WO-US00661 . 
XX 

PR 04-FEB-2000; 2000US-0180312 . 

PR 26-MAY-2000; 2000US-02074 56 . 

PR 30-JUN-2000; 2000US-0608408 . 

PR 03-AUG-2000; 2000US-0632366 . 

PR 21-SEP-2000; 2000US-0234 687 . 

PR 27-SEP-2000; 2000US-0236359 . 

PR 04-OCT-2000; 2000GB-00242 63 . 
XX 



PA (MOLE- ) MOLECULAR DYNAMICS INC. 
XX 

PI Penn SG, Hanzel DK, Chen W, Rank DR; 
XX 

DR WPI; 2001-476286/51. 
XX 

PT Novel single exon nucleic acid probe used to measuring gene expression 

PT in a human breast - 

XX 

PS Claim 25; SEQ ID No 9904; 322pp; English. 
XX 

CC The present invention relates to novel single exon nucleic acid probes. 

CC The present sequence is one such probe. The probes are useful for 

CC measuring human gene expression in a human breast sample, where the probe 

CC hybridises at high stringency to a nucleic acid expressed in the human 

CC breast. The probes are useful for predicting, diagnosing, grading, 

CC staging, monitoring and prognosing diseases of the human breast, 

CC particularly those diseases with polygenic aetiology. The diseases 

CC include: breast cancer, disorders of development, inflammatory diseases 

CC of the breast, fibrocystic changes, proliferative breast disease and 

CC non-carcinoma tumours . 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp . wipo . int/pub/published_pct_sequences . 
XX 

SQ Sequence 273 BP; 65 A; 54 C; 71 G; 83 T; 0 other; 



Query Match 7.3%; Score 32.4; DB 22; Length 273; 

Best Local Similarity 56.6%; Pred. No. 0.62; 

Matches 60; Conservative 0; Mismatches 46; Indels 0; Gaps 0; 

Qy 71 tgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaa 130 

I I I I I I I I I I I I I I I I I II I I I I I I I I I III 

Db 17 3 TGCCTTCCCCACCTCCCCCTCTTTCCCCTCCTCCTACTACCAGCCCCCATAAACAGACAG 114 

Qy 131 accaaagaggacagcaatgctaggaaaatgacgatgacaaagacga 17 6 
I I I III! I I I II minimum: IN 

Db 113 AAG AC AAAG G AGT T C AAT G T GAG GAAG AG G AAG AAG AG AAGAAAG A 68 



RESULT 14 
AAQ49754/C 

ID AAQ49754 standard; DNA; 7607 BP. 
XX 

AC AAQ4 9754; 
XX 

DT 10-MAR-1994 {first entry) 
XX 

DE pTK gene LpTK-2. 
XX 

KW pTK; protein tyrosine kinase; catalytic domain; c-kit; megakaryocyte; 
KW lymphocyte; amplification; primer; polymerase chain reaction; PCR; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 



FT CDS 1858.. 3375 

FT /*tag= a 
XX 

PN WO9315201-A. 
XX 

PD 05-AUG-1993. 
XX 

PF 22-JAN-1993; 93WO-US0058 6 . 
XX 

PR 22-JAN-1992; 92US-082 6935 . 
XX 

PA (NEWE-) NEW ENGLAND DEACONESS HOSPITAL. 
XX 

PI Avraham H, Cowley S, Groopman J, Scadden D; 
XX 

DR WPI; 1993-320330/40. 

DR P-PSDB; AAR41941. 
XX 

PT New protein tyrosine kinase genes and proteins encoded by genes - 

PT are of human mega-karyocytic origin 

XX 

PS Claim 2; Fig 5; 60pp; English. 
XX 

CC pTK genes were identified using two sets of degenerative 

CC oligonucleotide primers: a first set which amplifies all pTK DNA 

CC segments (AAQ49743-44 ) , and a second set which amplifies highly 

CC conserved sequences present in the catalytic domain of the c-kit 

CC subgroup of pTKs (AAQ4 9745-4 6) . The pTK genes identified are described 

CC in AAQ49747-57 and AAR41897-02. 

CC The LpTKs are expressed in lymphocytic cells, as well as 

CC megakaryocytic cells. The partial and full-length LpTK2 gene 

CC sequences are given in AAQ49749 and AAQ49754 respectively. The 

CC protein sequence corresp. to AAQ49749 is claimed (claim 7) and 

CC stated as given in the specification, however is missing from 

CC the publication. 

XX 

SQ Sequence 7607 BP; 1953 A; 1851 C; 1694 G; 2109 T; 0 other; 



Query Match 7.2%; Score 31.8; DB 14; Length 7607; 

Best Local Similarity 50.3%; Pred. No. 4.5; 

Matches 78; Conservative 0; Mismatches 77; Indels 0; Gaps 0; 

Qy 217 gccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgcta 276 

I I I I II I I I I I I I I I I I I I I I I I I I I II 

Db 2207 GACACGAAATAAAGCTGCCGGTGAAGTGGACTGCGCCCGAAGCCATTCGTAGTAATAAAT 214 8 

Qy 27 7 taagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagatactgc 336 

III I III II III I I I I I I I I I I I I I I I I II I 
Db 214 7 TCAGCATTAAGTCCGATGTATGGTCATTTGGAATCCTTCTTTATGAAATCATTACTTATG 2088 

Qy 337 ggagacatcatgatactgcggagacagacggccag 371 

III I I I I I I II I I I I I I I I 
Db 2087 GCAAAATGCCTTACAGTGGTATGACAGGTGCCCAG 2053 



RESULT 15 



AAT03097/C 

ID AAT03097 standard; DNA; 7607 BP. 
XX 

AC AAT03097; 
XX 

DT 14-FEB-1996 (first entry) 
XX 

DE Protein tyrosine-kinase LpTK2 gene. 
XX 

KW Protein tyrosine-kinase; pTK; LpTK2; agonist; cell growth; 

KW differentiation; ss. 

XX 

OS Homo sapiens. 
XX 

PN WO9527061-A1 . 
XX 

PD 12-OCT-1995. 
XX 

PF 04-APR-1995; 95WO-US04228 . 
XX 

PR 04-APR-1994; 94US-0222616 . 
XX 

PA (GETH ) GENENTECH INC. 
XX 

PI Bennett BD, Goeddel D, Lee JM, Matthews W, Tsai SP; 

PI Wood WI; 

XX 

DR WPI; 1995-366160/47. 

DR P-PSDB; AAR85929. 
XX 

PT Agonist antibodies which activate specific protein tyrosine 

PT kinase (s) - also activate chimeric proteins of kinase extracellular 

PT domain and Ig constant domain, useful for studying, and therapeutic 

PT modulation of, cell growth and differentiation 

XX 

PS Disclosure; Page 48-56; 125pp; English. 
XX 

CC DNA probes based on protein tyrosine-kinase (pTK) sequences were used 

CC to screen cDNA libraries to identify novel pTK genes. A LpTK2 gene 

CC (AAT03097) was isolated from lymphocytic and megakaryocyte cell 

CC libraries. The gene can be used to produce recombinant LpTK2, to 

CC identify other new pTK genes, or to design drugs, peptides or 

CC antisense constructs that modulate pTK activity. 

XX 

SQ Sequence 7607 BP; 1954 A; 1851 C; 1693 G; 2109 T; 0 other; 



Query Match 7.2%; Score 31.8; DB 16; Length 7607; 

Best Local Similarity 50.3%; Pred. No. 4.5; 

Matches 78; Conservative 0; Mismatches 77; Indels 0; Gaps 0; 

Qy 217 gccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgcta 276 

I I I I II I I I I I I I I I I I I I I I I I I I I II 

Db 2207 GACACGAAATAAAGCTGCCGGTGAAGTGGACTGCGCCCGAAGCCATTCGTAGTAATAAAT 214 8 

Qy 27 7 taagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagatactgc 336 
III I III II III I III I I I I I I I I I I I I I I I 



Db 



214 7 TCAGCATTAAGTCCGATGTATGGTCATTTGGAATCCTTCTTTATGAAATCATT ACTTATG 2088 



Qy 337 ggagacatcatgatactgcggagacagacggccag 371 

III I I I I I I I I I I I I I I I I 
Db 2087 GCAAAATGCCTTACAGTGGTATGACAGGTGCCCAG 2053 



Search completed: February 7, 2002, 10:59:55 
Job time: 4981 sec 



GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table : 



Searched: 



February 7, 2002, 10:51:40 ; Search time 172.96 Seconds 

(without alignments) 
581.384 Million cell updates/sec 

US-09-394-745-6154 
444 

1 cgaaaacactggtacccaaa tcccattttaagaaataaat 444 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



351203 seqs, 113238999 residues 



702406 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2__6/ptodata/2/ina/5A_COMB. seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB.seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB . seq: * 

4 : /cgn2_6/ptodata/2/ina/6B_COMB. seq: * 

5 : /cgn2_6/ptodata/2/ina/PCTUS_COMB. seq: * 

6 : /cgn2_6/ptodata/2/ina/backf ilesl .seq: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
US-08-232-463-14/C 

; Sequence 14, Application US/08232463 

; Patent No. 5670367 

; GENERAL INFORMATION: 

APPLICANT : D0RNER, F . 

APPLICANT: SCHEI FLINGER, F. 

APPLICANT: FALKNER, F. G. 



TITLE OF INVENTION: RECOMBINANT FOWLPOX VIRUS 
NUMBER OF SEQUENCES: 52 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Foley & Lardner 

STREET: 1800 Diagonal Road, Suite 500 
; CITY: Alexandria 

STATE: VA 

COUNTRY: USA 

ZIP: 22313-0299 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/232,4 63 

FILING DATE: 

CLASSIFICATION: 4 35 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US/07/935,313 

FILING DATE: 

APPLICATION NUMBER: EP 91 114 300.6 

FILING DATE: 26-AUG-1991 
ATTORNEY/AGENT INFORMATION: 

NAME: BENT, Stephen A. 

REGISTRATION NUMBER: 29,768 

REFERENCE/DOCKET NUMBER: 30472/114 IMMU 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703)8 3 6-9300 

TELEFAX: (703)683-4109 

TELEX: 899149 
; INFORMATION FOR SEQ ID NO: 14: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 7218 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
IMMEDIATE SOURCE: 

CLONE: pTZgpt-Fls 
US-08-232-463-14 



Query Match 9.3%; Score 41.4; DB 1; Length 7218; 

Best Local Similarity 3.1%; Pred. No. 0.00035; 

Matches 12; Conservative 211; Mismatches 163; Indels 0; Gaps 0; 

Qy 2 gaaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaa 61 

| | | | I I I I I I :::::: : : : :::::::: : : : : :::::: 
Db 1448 GAAGAATTTGGTACRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 1389 

Qy 62 taatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaa 121 

Db 1388 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 132 9 

Qy 122 agtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggca 181 



Db 1328 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 12 69 



Qy 182 tcgggcaacatacttgttagccgtaatgacgacgggccatgctatctagattccggtctt 241 
Db 12 68 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 120 9 



Qy 24 2 aatgagtacgtctgcagaaagactaataagtgctataagagcttggtgctctgcgtggcg 301 

Db 1208 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 114 9 

Qy 302 agttgtcaaccatcatcatgaattcaagatactgcggagacatcatgatactgcggagac 361 

Db 114 8 RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR 108 9 

Qy 362 agacggccagagatgangctagctag 387 

: : : : : :::::::: : : I I 

Db 1088 RRRRRRRRRRRRRRRRRRRRRRATCG 1063 



RESULT 2 
US-08-204-656B-3/C 

Sequence 3, Application US/08204656B 
Patent No. 5538882 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Matsui, Ikuo 
Ishikawa, Kazuhiko 
Miyairi, Sachio 
Honda, Koichi 

Variant-Type Carbohydrate Hydrolase, 
Variant Gene Of The Enzyme And Method For Producing 
Oligosaccharide Using The Enzyme 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 18 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Birch, Stewart, Kolasch & Birch, LLP 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/204 , 656B 

FILING DATE: 02-MAR-1994 

CLASSIFICATION: 435 
ATTORNEY /AGENT INFORMATION: 

NAME: Weiner, Marc S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 



LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: other nucleic acid 

DESCRIPTION: /desc = "Synthetic nucleic acid" 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
IMMEDIATE SOURCE: 
; CLONE: Derived from plasmid pSf s l (Agric. Biol. Chem. 

FEATURE: 

NAME/KEY: CDS 

LOCATION: 1..1404 

OTHER INFORMATION: /note= "Nucleotides 1-1404 
OTHER INFORMATION: correspond to nucleotides 79-1482 in the 
Saccharomycopsis 

; OTHER INFORMATION: fibuligera x -amylase structural gene" 

US-08-204-656B-3 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I 11111111111 I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I III I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 3 
US-08-204-656B-5/C 

Sequence 5, Application US/08204656B 
Patent No. 5538882 
GENERAL INFORMATION: 

APPLICANT: Matsui, Ikuo 
APPLICANT: Ishikawa, Kazuhiko 
APPLICANT: Miyairi, Sachio 
APPLICANT: Honda, Koichi 

TITLE OF INVENTION: Variant-Type Carbohydrate Hydrolase, 
TITLE OF INVENTION: Variant Gene Of The Enzyme And Method For Producing 
TITLE OF INVENTION: Oligosaccharide Using The Enzyme 
NUMBER OF SEQUENCES: 18 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Birch, Stewart, Kolasch & Birch, LLP 
STREET: 8110 Gatehouse Road, Suite 500 East 
CITY: Falls Church 
STATE: Virginia 
COUNTRY: U.S.A. 
ZIP: 22042 



COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM:' PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08 /204 , 656B 

FILING DATE: 02-MAR-1994 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
; NAME: Weiner, Marc S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: other nucleic acid 

DESCRIPTION: /desc = "Synthetic DNA" 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
IMMEDIATE SOURCE: 
; CLONE: Derived from plasmid pSf s l (Agric. Biol. Chem. 

FEATURE: 

NAME /KEY: CDS 

LOCATION: 1..14 04 

OTHER INFORMATION: /note= "Nucleotides 1-1404 

OTHER INFORMATION: correspond to nucleotides 79-1482 of the 

Saccharomycopis 

OTHER INFORMATION: fibuligera '-amylase structural gene" 

US-08-204-656B-5 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I II I I I II I I I I I II I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

Mil II I I I I I I I I I I I I I I I I I I III 

Db 629 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 4 



US-08-204-656B-7/C 

Sequence 7, Application US/08204656B 
Patent No. 5538882 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Matsui, Ikuo 
Ishikawa, Kazuhiko 
Miyairi, Sachio 
Honda, Koichi 



TITLE OF INVENTION: Variant-Type Carbohydrate Hydrolase, 

TITLE OF INVENTION: Variant Gene Of The Enzyme And Method For Producing 
TITLE OF INVENTION: Oligosaccharide Using The Enzyme 
NUMBER OF SEQUENCES: 18 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Birch, Stewart, Kolasch & Birch, LLP 
STREET: 8110 Gatehouse Road, Suite 500 East 
CITY: Falls Church 
STATE: Virginia 
COUNTRY: U.S.A. 
ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08 /204 , 656B 
FILING DATE: 02-MAR-1994 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Weiner, Marc S. 
REGISTRATION NUMBER: 32,181 
REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 205-8000 
TELEFAX: (703) 205-8050 
TELEX: 248345 
INFORMATION FOR SEQ ID NO: 7: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1404 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: other nucleic acid 

DESCRIPTION: /desc - "Synthetic DNA" 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
IMMEDIATE SOURCE: 

CLONE: Derived from plasmid pSf s l (Agric. Biol. Chem. 
FEATURE: 

NAME /KEY : CDS 
LOCATION: 1..1404 

OTHER INFORMATION: /note= "Nucleotides 1-1404 
OTHER INFORMATION: correspond to nucleotides 79-1482 of the 
Saccharomycopsis 

OTHER INFORMATION: fibuligera ^-amylase structural gene" 
US-08-204-656B-7 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 



0; 



Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

III I I I I I I I I I I I I I I Mill II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

III I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I II I II I II I I I I I I I I I I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 5 
US-08-470-702-2/C 

; Sequence 2, Application US/08470702 
; Patent No. 5631149 
; GENERAL INFORMATION; 

APPLICANT: MATSUI, IKUO 

APPLICANT: ISHIKAWA, KAZUHIKO 

APPLICANT: MIYAIRI , SACHIO 

APPLICANT: HONDA, KOICHI 

TITLE OF INVENTION: VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

TITLE OF INVENTION: VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
TITLE OF INVENTION: OLIGOSACCHARIDE USING THE ENZYME 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH . 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 
; STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 70,702 

FILING DATE: 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/ DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 



TELEX: 248345 
INFORMATION FOR SEQ ID NO: 2: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1404 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-470-702-2 

Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I 1 I I I I I I I II I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 T T AGC AC T AT C AAT T CT T AAACC AT C AAT T GAGT AAT T GC C AAC AAAAT C T T T AACCC AA 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I II I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 6 
US-08-470-702-3/c 

Sequence 3, Application US/08470702 
Patent No. 5631149 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



MATSUI, IKUO 
ISHIKAWA, KAZUHIKO 
MIYAIRI, SACHIO 
HONDA, KOICHI 

VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
OLIGOSACCHARIDE USING THE ENZYME 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/470,702 



FILING DATE : 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-470-702-3 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II II I I I I I Mill I I I I I I III 

Db 62 9 T T AGC AC TAT C AAT T C T T AAAC CAT C AAT T G AGT AAT T GCC AAC AAAAT C T T T AAC C C AA 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I I I I II MM I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 7 
US-08-470-702-4/C 

Sequence 4, Application US/08470702 
Patent No. 5631149 
GENERAL INFORMATION: 



APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
TITLE OF INVENTION: 
TITLE OF INVENTION: 

TITLE OF INVENTION: OLIGOSACCHARIDE USING THE 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 



MATSUI, IKUO 
ISHIKAWA, KAZUHIKO 
MIYAIRI, SACHIO 
HONDA, KOICHI 

VARIANT-TYPE 
VARIANT GENE 



CARBOHYDRATE HYDROLASE, 
OF THE ENZYME AND METHOD 
ENZYME 



FOR PRODUCING 



STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 
; STATE: Virginia 

COUNTRY: U.S.A. 

ZIP : 22042 
COMPUTER READABLE .FORM : 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/470,702 

FILING DATE: 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 4: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-470-702-4 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I! I I III I III I I I I I I I II I I I I I I I I I II I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 ■ 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCT^A 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 22 5 

I I I I I I I I II I I I I II III I I II I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 8 
US-08-467-831-2/c 

; Sequence 2, Application US/08467831 



; Patent No. 5635378 
; • GENERAL INFORMATION: 

APPLICANT: MATSUI, IKUO 

APPLICANT: ISHIKAWA, KAZUHIKO 

APPLICANT: MIYAIRI, SACHIO 

APPLICANT: HONDA, KOICHI 

TITLE OF INVENTION: VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

TITLE OF INVENTION: VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
TITLE OF INVENTION: OLIGOSACCHARIDE USING THE ENZYME 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS /MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 67,831 

FILING DATE: 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 2: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-467-831-2 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 



Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 17 0 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttag'ccgtaatgacgacgggccatgcta 225 

I I I I I I I I II II I I I I I I I I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 9 
US-08-467-831-3/C 

Sequence 3, Application US/08467831 
Patent No. 5635378 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



MATSUI, IKUO 
ISHIKAWA, KAZUHIKO 
MIYAIRI, SACHIO 
HONDA, KOICHI 

VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
OLIGOSACCHARIDE USING THE ENZYME 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 67,831 

FILING DATE: 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS : double 

TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 



ANTI-SENSE : NO 
US-08-467-831-3 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 17 0 

III I I I I I I I I I I Mill Mill I III 

Db 629 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTT7VACCCAA 570 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I' I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 10 
US-08-467-831-4/C 

Sequence 4, Application US/08467831 
Patent No. 5635378 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



MATSUI, IKUO 
ISHIKAWA, KAZUHIKO 
MIYAIRI, SACHIO 
HONDA, KOICHI 

VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
OLIGOSACCHARIDE USING THE ENZYME 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: ■ IBM PC compatible 

OPERATING SYSTEM: PC-DOS /MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 67,831 

FILING DATE: 06-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 

FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: WEINER, MARC S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 



TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 205-8000 
TELEFAX: (703) 205-8050 
TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 4: 

SEQUENCE CHARACTERISTICS: 
LENGTH: 1404 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: double 
TOPOLOGY: linear 

MOLECULE TYPE: DNA (synthetic) 

HYPOTHETICAL: NO 

ANTI-SENSE: NO 
US-08-467-831-4 



Query Match 7.7%; Score 34.2; DB 1; Length 1404; 

Best Local Similarity 49.7%; Pred. No. 0.046; 

Matches 87; Conservative 0; Mismatches 88; Indels 0; Gaps 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGCACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy ■ 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGT7VATTGCCAACAAAATCTTTAACCCAA 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 11 
US-08-204-656B-1/C 

Sequence 1, Application US/08204656B 
Patent No. 5538882 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Matsui, Ikuo 
Ishikawa, Kazuhiko 
Miyairi, Sachio 
Honda, Koichi 

Variant-Type Carbohydrate Hydrolase, 
Variant Gene Of The Enzyme And Method For Produci 
Oligosaccharide Using The Enzyme 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 18 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Birch, Stewart, Kolasch & Birch, LLP 

STREET: 8110 Gatehouse Road, Suite 500 East 

CITY: Falls Church 

STATE: Virginia 

COUNTRY: U.S.A. 

ZIP : 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 



SOFTWARE : Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/204 , 656B 

FILING DATE: 02-MAR-1994 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 

NAME: Weiner, Marc S. 

REGISTRATION NUMBER: 32,181 

REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (703) 205-8000 

TELEFAX: (703) 205-8050 

TELEX: 248345 
; INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1404 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: double 

TOPOLOGY: linear 
MOLECULE TYPE: other nucleic acid 

DESCRIPTION: /desc - "Synthetic DNA" 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
IMMEDIATE SOURCE: 
; CLONE: Derived from plasmid pSf s l (Agric. Biol. Chem. 

FEATURE: 

NAME /KEY: CDS 

LOCATION: 1..14 04 

OTHER INFORMATION: /note= "Nucleotides 1-1404 
OTHER INFORMATION: correspond to nucleotides 79-1482 in the 
Saccharomycopsis 

; OTHER INFORMATION: fibuligera '-amylase structural gene" 

US-08-204-656B-1 



Query Match 7.3%; Score 32.6; DB 1; Length 14 04; 

Best Local Similarity 49.1%; Pred. No. 0.17; 

Matches 86; Conservative 0; Mismatches 89; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

II! I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 689 CCTACTGAGTAAACTCCAGATGGACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II MINI I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTT7VACCC7^A 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 12 
US-08-470-702-1/C 

; Sequence 1, Application US/08470702 
; Patent No. 5631149 
; GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



MATSUI, IKUO 
ISHIKAWA, KAZUHIKO 
MIYAIRI, SACHIO 
HONDA, KOICHI 

VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
OLIGOSACCHARIDE USING THE ENZYME 



TITLE OF INVENTION 
TITLE OF INVENTION 
TITLE OF INVENTION 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART, KOLASCH & BIRCH 
STREET: 8110 Gatehouse Road, Suite 500 East 
CITY: Falls Church 
STATE: Virginia 
COUNTRY: U.S.A. 
ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/470,702 
FILING DATE: 06-JUN-1995 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 
FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 
NAME: WEINER, MARC S. 
REGISTRATION NUMBER: 32,181 
REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 205-8000 
TELEFAX: (703) 205-8050 
TELEX: 248345 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1404 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-470-702-1 



Query Match 7.3%; Score 32.6; DB 1; Length 1404; 

Best Local Similarity 49.1%; Pred. No. 0.17; 

Matches 86; Conservative 0; Mismatches 89; Indels 0; Gaps 0; 

Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 68 9 CCTACTGAGTAAACTCCAGATGGACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 



Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 
I I I I II I I I I I I I I I I I I I I I I I I III 



Db 62 9 TTAGCACTATCAATTCTTAAACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 570 



Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II I I I I II III I I I I .1 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 13 
US-08-467-831-l/c 

Sequence 1, Application US/08467831 
Patent No. 5635378 
GENERAL INFORMATION: 

APPLICANT: MATSUI, IKUO 
APPLICANT: ISHIKAWA, KAZUHIKO 
APPLICANT: MIYAIRI , SACHIO 
APPLICANT: HONDA, KOICHI 

TITLE OF INVENTION: VARIANT-TYPE CARBOHYDRATE HYDROLASE, 

TITLE OF INVENTION: VARIANT GENE OF THE ENZYME AND METHOD FOR PRODUCING 
TITLE OF INVENTION: OLIGOSACCHARIDE USING THE ENZYME 
NUMBER OF SEQUENCES: 17 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: BIRCH, STEWART , KOLASCH & BIRCH 
STREET: 8110 Gatehouse Road, Suite 500 East 
CITY: Falls Church 
STATE: Virginia 
COUNTRY: U.S.A. 
ZIP: 22042 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS /MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 67,831 
FILING DATE: 06-JUN-1995 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/204,656 
FILING DATE: 02-MAR-1994 
ATTORNEY/AGENT INFORMATION: 
NAME: WEINER, MARC S. 
REGISTRATION NUMBER: 32,181 
REFERENCE/DOCKET NUMBER: 234-252P 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (703) 205-8000 
TELEFAX: (703) 205-8050 
TELEX: 248345 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1404 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: DNA (synthetic) 
HYPOTHETICAL: NO 
ANTI-SENSE: NO 
US-08-467-831-1 



Query Match 7.3%; Score 32.6; DB 1; Length 1404; 

Best Local Similarity 49.1%; Pred. No. 0.17; 

Matches 86; Conservative 0; Mismatches 89; Indels 0; Gaps 



0; 



Qy 51 cctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagc 110 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I ! I 
Db 68 9 CCTACTGAGTAAACTCCAGATGGACTAACAAAATCCGGGAAAAAGCCTTGGTCCACATGT 630 

Qy 111 aagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaa 170 

I I I I II I I I I I I I I I I I I I I I I I I III 

Db 62 9 TTAGCACTATC7VATTCTT7\AACCATCAATTGAGTAATTGCCAACAAAATCTTTAACCCAA 57 0 

Qy 171 agacgagggcatcgggcaacatacttgttagccgtaatgacgacgggccatgcta 225 

I I I I I I I I II MM II III I I I I I 

Db 569 GAATTGAAAACTGAGGCCACGTCGCTATCTTCCGTTCTCAAATCTGGTAATGCAA 515 



RESULT 14 
PCT-US95-05008-5 

Sequence 5, Application PC/TUS9505008 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Sugen, Inc. 

515 Galveston Drive 

Redwood City, California 94063-4720 
United States of America 
Wissenschaf ten E.V. 

2 



Novel Megakaryocyt ic Protein Tyrosine 
Kinases 
21 



Hofgarten Str. 
Munchen 80539 
Germany 
TITLE OF INVENTION: 
TITLE OF INVENTION: 
NUMBER OF SEQUENCES 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Pennie & Edmonds 
STREET: 1155 Avenue of the Americas 
CITY: New York 
STATE: New York 
COUNTRY: U.S.A. 
ZIP: 10036 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: PCT/US 95/05008 
FILING DATE: 24-APR-1995 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/232,545 
FILING DATE: 22-APR-1994 
CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 
NAME: Coruzzi, Laura A. 
REGISTRATION NUMBER: 30,742 



25 



REFERENCE/DOCKET NUMBER: 7683-074 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (212)7 90-9090 

TELEFAX: (212)869-9741 

TELEX: 66141 PENNIE 
; INFORMATION FOR SEQ ID NO: 5: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2770 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: unknown 

TOPOLOGY: unknown 
MOLECULE TYPE: DNA 
PCT-US95-05008-5 



Query Match 7.2%; 
Best Local Similarity 50.3%; 
Matches 78; Conservative 



Score 31.8; DB 5; Length 2770; 
Pred. No. 0.45; 
0; Mismatches 77; Indels 0; 



Gaps 



Qy 217 gccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgcta 276 

I I I I II I I I I I I I I I I I I I I I I I I I I II 

Db 1534 GACACGAAATAAAGCTGCCGGTGAAGTGGACTGCGCCCGAAGCCATTCGTAGTAATAAAT 1593 

Qy 277 taagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagatactgc 336 

III I III II III I I I I 1 I I I I I I I I I I I I I I 
Db 1594 TCAGCATTAAGTCCGATGTATGGTCATTTGGAATCCTTCTTTATGAAATCATTACTTATG 1653 

Qy 337 ggagacatcatgatactgcggagacagacggccag 371 

III I I I I I I I I I I I I I I II 
Db 1654 GCAAAATGCCTTACAGTGGTATGACAGGTGCCCAG 168 8 



0; 



RESULT 15 
US-08-222-616-19/C 

Sequence 19, Application US/08222616 
Patent No. 5635177 
GENERAL INFORMATION: 

APPLICANT: Bennett, Brian D. 
APPLICANT: Goeddel, David 
APPLICANT: Lee, James M. 
APPLICANT: Matthews, William 
APPLICANT: Tsai, Siao Ping 
APPLICANT: Wood, William I. 

TITLE OF INVENTION: PROTEIN TYROSINE KINASE AGONIST 
TITLE OF INVENTION: ANTIBODIES 
NUMBER OF SEQUENCES: 42 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Genentech, Inc. 
STREET: 4 60 Point San Bruno Blvd 
CITY: South San Francisco 
STATE: California 
COUNTRY: USA 
ZIP: 94080 
COMPUTER READABLE FORM: 

MEDIUM TYPE: 5.25 inch, 360 Kb floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 



SOFTWARE: patin (Genentech) 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/222,616 

FILING DATE: 4-APR-1994 

CLASSIFICATION: 530 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US93/0058 6 

FILING DATE: 22-JAN-1993 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 07/826935 

FILING DATE: 22-JAN-1992 
ATTORNEY/AGENT INFORMATION: 

NAME: Lee, Wendy M. 

REGISTRATION NUMBER: 

REFERENCE/DOCKET NUMBER: 821P2 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 415/225-1994 

TELEFAX: 415/952-9881 

TELEX: 910/371-7168 
INFORMATION FOR SEQ ID NO: 19: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 7607 bases 

TYPE: nucleic acid 

STRANDEDNESS : single 

TOPOLOGY: linear 



Query Match 7.2%; Score 31.8; ' DB 1; Length 7607; 

Best Local Similarity 50.3%; Pred. No. 0.76; 

Matches 78; Conservative 0; Mismatches 77; Indels 0; Gaps 

Qy 217 gccatgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgcta 27 6 

I I I I II I I I I I I I I I I I I I I I I I I I I II 

Db 2207 GACACGAAATAAAGCTGCCGGTGAAGTGGACTGCGCCCGAAGCCATTCGTAGTAATAAAT 214 8 

Qy 277 taagagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagatactgc 336 

III I III II III I I II I I I I I I I I I I I II I I 
Db 214 7 TCAGCATTAAGTCCGATGTATGGTCATTTGGAATCCTTCTTTATGAAATCATTACTTATG 2088 

Qy 337 ggagacatcatgatactgcggagacagacggccag 371 

III I I I I I I 11 I I I I I I I I 
Db 2087 GCAAAATGCCTTACAGTGGTATGACAGGTGCCCAG 2053 



Search completed: February 7, 2002, 10:51:46 
Job time: 6072 sec 

GenCore version 4.5 
Copyright (c) 1993 - 2000 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on : 



February 7, 2002, 08:20:41 ; Search time 4942.22 Seconds 

(without alignments ) 



965.381 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-394-745-6154 
444 

1 cgaaaacactggtacccaaa tcccattttaagaaataaat 444 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 11351937 seqs, 5372889281 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



22703874 



Database 



EST; 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 



em_estf un : * 

em_esthum: * 

em_estin : * 

em_estom: * 

em_estpl : * 

em_estba : * 

em_estro : * 

em_estov : * 

em_htc : * 
gb__estl : * 
gb_est2 : * 
gb_htc : * 
gb_gss : * 
em_gss_f un : * 
em_gss_hum: * 
em_gss_inv : * 
em_gss_pln : * 
em_gss_pro: * 
em_gss__rod: * 
em_gss_vrt : * 
em_gss_other : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 



Query 



No. 


Score 


Match 


Length 


DB 


ID 


c 1 


156. 4 


35.2 


261 


11 


BG837106 


c 2 


121.2 


27.3 


129 


10 


AI673919 


3 


45 


10. 1 


309 


11 


BG240000 


4 


45 


10. 1 


435 


11 


BG240586 


5 


44 . 6 


10.0 


1068 


11 


BG326023 



Description 



BG837106 Zm08_06h0 
AI673919 605039B05 
BG240000 OVl_31_G0 
BG240586 OVl__31_G0 
BG326023 602424785 
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ALIGNMENTS 



RESULT 1 
BG837106/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



BG837106 261 bp mRNA EST 25-MAY-2001 

Zm08_06h09_A 

Zm08_AAFC_ECORC_Fusarium_graminearum_inoculated_corn_ear Zea mays 

cDNA clone Zm08_06h09, mRNA sequence. 

BG837106 

BG837106.1 GI:14203429 
EST. 

Zea mays . 
Zea mays 

Eukaryota ; Vir idiplant ae ; St reptophyta ; Embryophyta ; Tracheophyt a ; 



Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae; Zea. 
REFERENCE 1 (bases 1 to 261) 

AUTHORS Harris, L. J. , Balcerzak, M . , Allard,S., Saparno,A., Couroux,P., De 

Moors, A., Hattori, J. I . , Ouellet,T., Robert, L.S., Singh, J. A, Sprott 
, D. and Tinker, N. A. 
TITLE Expressed Sequence Tags from Developing Maize Kernels Six Days 

after Silk Channel Inoculation with Fusarium graminearum 
JOURNAL Unpublished (2001) 
COMMENT Contact: Harris, Linda J. 

Eastern Cereal and Oilseed Research Centre 
Agriculture and Agri-food Canada 

Bldg. 21, Central Experimental Farm, Ottawa, Ontario, K1A 0C6, 
CANADA 

Tel: (613) 759-1314 
Fax: (613) 759-6566 
Email : harrisl j @em. agr . ca . 
FEATURES Locat ion/Quali f iers 

source 1. .261 

/organism="Zea mays" 
/cultivar="CO430" 
/db_xref="taxon:4577" 
/clone="Zm08_06h09" 

/clone_lib=" Zm08_AAFC_ECORC_Fusarium_graminearum_inoculate 
d__corn_ear" 

/tissue_type="Developing kernels (sibcrossed) " 
/dev_stage="10-ll days post-silk emergence" 
/note="Vector: Bluescript SK+/XhoI-EcoRI ; Site_l: EcoRI; 
Site_2: Xhol; Field-grown maize ears were silk 
channel-inoculated in the morning (-10 am) with 1 ml of a 
Fusarium graminearum macroconidial suspension (500,000 
spores/ml) and whole ears were collected and immediately 
frozen in liquid nitrogen 6 days later." 

BASE COUNT 65 a 60 c 59 g 61 t 16 others 

ORIGIN 



Query Match 35.2%; Score 156.4; DB 11; Length 261; 

Best Local Similarity 78.6%; Pred. No. 1.7e-32; 

Matches 184; Conservative 12; Mismatches 34; Indels 4; Gaps 1; 

Qy 32 ccaagggcaaattcaacaacctccaaagaataatccgggtgccttccaagaatcctccaa 91 

I : I I : I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 230 CMTGGGCGVAGTGCSGCAGCTTCCAAGGATTATTCTGGTTGCTTTCCATGATTCTTCTCG 171 

Qy 92 ccacccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgct 151 

I : II III III I III II I : I I I I I I I I I I I I I II I : : I I : I I I I I I I 

Db 170 CMTCCTTTGTTGCTAATGCACGCACC MGTGGGCAAACCAAAGARRACMGAAATGCT 115 

Qy 152 aggaaaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgac 211 

I I I M I I I I I II I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I : I I I I I I I I : I I I II 
Db 114 AGGAACATGACGATGACCAAGACGAGGGCATCGGGCMACATACTTSTTAGCCGTMATGAC 55 



Qy 212 gacgggccatgctatctagattccggtcttaatgagtacgtctgcagaaagact 265 

I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 54 GMCGGGCCATGCTWTCTAGATTCCGGTCTTAATGAGTACGTCTGCAGAAAGACT 1 



RESULT 2 
AI673919/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



AI673919 129 bp mRNA EST 02-FEB-2000 

605039B05.xl 605 - Endosperm cDNA library from Schmidt lab Zea mays 
cDNA, mRNA sequence. 
AI673919 

AI673919.1 GI:4874399 
EST. 

Zea mays. 
Zea mays 

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae; Zea. 
1 (bases 1 to 129) 
Walbot,V. 

Maize ESTs from various cDNA libraries sequenced at Stanford 
University 
Unpublished (1999) 
Contact: Walbot V 

Department of Biological Sciences 
Stanford University 

855 California Ave, Palo Alto, CA 94304, USA 

Tel: 650 723 2227 

Fax: 650 725 8221 

Email: walbot@stanford.edu 

Plate: 605039 row: B column: 05. 

Location /Qualifiers 

1. .129 

/organism="Zea mays" 
/cultivar="Ohio4 3" 
/db_xref="taxon: 4577" 

/clone_lib="605 - Endosperm cDNA library from Schmidt lab" 
/tissue_type="nucellar, embryo, and endosperm" 
/dev_stage="10-14 days post-pollination" 
/lab_host="DH5 (alpha) " 

/note="Organ: Kernel; Vector: pAD-GAL4-2 1 ; Site_l: EcoRI; 
Site_2: Xhol; Kernel endosperm cDNA library from Schmidt 
lab" 

32 a 26 c 25 g 46 t 



Query Match 27.3%; Score 121.2; DB 10; Length 129; 

Best Local Similarity 95.3%; Pred. No. 6.1e-23; 

Matches 123; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 316 atcatgaattcaagatactgcggagacatcatgatactgcggagacagacggccagagat 375 

I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I 

Db 12 9 ATCATGAATTCATGATACTGCGGAGACATCATGATACTGCGGAGACAGACGGCGAGAGAT 7 0 

Qy 37 6 gangctagctagatgccgtttcaccannatattatgtaacacccaaatctcccattttaa 4 35 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 69 GAGGCTAGCTAGATGCTGTTTCACCAAAATATTATGTAACACCCAAATCTCCCATTTTAA 10 



Qy 436 gaaataaat 444 
I I I I II I I I 



Db 9 GAAATAAAT 1 



RESULT 3 

BG240000 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



BG240000 309 bp mRNA EST 15-FEB-2001 

OVl_31_G02.gl_A002 Ovary 1 (OV1) Sorghum bicolor cDNA, mRNA 
sequence . 
BG240000 

BG240000.1 GI:12775073 
EST. 

sorghum. 
Sorghum bicolor 

Eukaryota; Viridiplantae ; Streptophyta ; Embryophyta; Tracheophyta; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae ; Sorghum. 
1 (bases 1 to 309) 

Cordonnier-Pratt, M. -M. , Gingle,A., Marsala, C, Sudman,M. and Pratt 
,L.H. 

An EST database from Sorghum: ovaries of varying immature stages 

Unpublished (2000) 

Contact: Cordonnier-Pratt MM 

Department of Botany 

The University of Georgia 

Plant Sciences Building, Rm. 2502, Athens, GA 30602-7271, USA 
Tel: 706 542 1860 
Fax: 706 542 1805 
Email: mmpratt@uga.edu 

Sequences have been trimmed to exclude PolyA, vector and regions 
below Phred quality 16. The threshold for highest quality sequence 
is 20. 

Seq primer: PolyTMix 

High quality sequence stop: 305 

POLYA-No . 

Location/Qualifiers 
1. .309 

/organism="Sorghum bicolor" 
/db_xref="taxon:4558" 
/clone_lib="Ovary 1 (OV1)" 

/note-"Organ : Mix of ovaries of varying immature stages 
from 8-week-old plants; Vector: pBluescript II from Lambda 
Zap II; Site_l: Xhol; Site_2 : EcoRI; The library was made 
from poly-A RNA in the cloning vector lambda ZAP II. 
Clones to be sequenced were prepared by mass excision." 
95 a 65 c 76 g 73 t 



Query Match 10.1%; Score 45; DB 11; Length 309; 

Best Local Similarity 63.9%; Pred. No. 0.049; 

Matches 85; Conservative 0; Mismatches 45; Indels 3; Gaps 1; 

Qy 69 ggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggc 128 

II I I i I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 94 GGCTTCTTTCCATGGTTCTTCTGGCATCCTCAGTTGTTTATGCACGCACAATAAATGGGC 153 

Qy 12 9 aaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgggca 188 



Db 154 AAACCAAAGAGGACATCAACACCAGGAGTGTGACGATGAT GACAAGGTCAGCAAGCT 210 



Qy 189 acatacttgttag 201 

I I I I III III 
Db 211 CCATAATTGGTAG 223 



RESULT 4 

BG240586 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



BG240586 435 bp mRNA EST 15-FEB-2001 

OV1_31_G02 ,bl_A002 Ovary 1 (OV1) Sorghum bicolor cDNA, mRNA 
sequence . 
BG240586 

BG240586.1 GI:12775659 
EST. 

sorghum. 

Sorghum bicolor 

Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta ; 
Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; PACC 
clade; Panicoideae; Andropogoneae ; Sorghum. 
1 (bases 1 to 435) 

Cordonnier-Pratt,M.-M. , Gingle,A., Marsala, C, Sudman,M. and Pratt 
,L.H. 

An EST database from Sorghum: ovaries of varying immature stages 

Unpublished (2000) 

Contact: Cordonnier-Pratt MM 

Department of Botany 

The University of Georgia 

Plant Sciences. Building, Rm. 2502, Athens, GA 30602-7271, USA 
Tel: 706 542 1860 
Fax: 706 542 1805 
Email : mmpratt@uga . edu 

Sequences have been trimmed to exclude PolyA, vector and regions 
below Phred quality 16. The threshold for highest quality sequence 
is 20. 

Seq primer: JEN REV 

High quality sequence stop: 430 

POLYA=No . 

Location/Qualifiers 
1. .435 

/organism="Sorghum bicolor" 
/db_xref =" taxon : 4 558 " 
/clone_lib="Ovary 1 (OV1)" 

/note="Organ : Mix of ovaries of varying immature stages 
from 8-week-old plants; Vector: pBluescript II from Lambda 
Zap II; Site_l: Xhol; Site_2 : EcoRI; The library was made 
from poly-A RNA in the cloning vector lambda ZAP II. 
Clones to be sequenced were prepared by mass excision." 
130 a 89 c 104 g 112 t 



Query Match 10.1%; 
Best Local Similarity 63.9%; 
Matches 85; Conservative 



Score 45; DB 11; Length 435; 
Pred. No. 0.053; 
0; Mismatches 45; Indels 3; Gaps 1; 



Qy 69 ggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggc 128 

II I I I I I I I I I I I I I I II I I I I I I I I I II I I I 

Db 94 GGCTTCTTTCCATGGTTCTTCTGGCATCCTCAGTTGTTTATGCACGCACAATAAATGGGC 153 

Qy 129 aaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgggca 188 

I I I I I I I I I I I I I I I I I I I I I II I I I It It I I I I I I I I I I I II 
Db 154 AAACCAAAGAGGACATCAACACCAGGAGTGTGACGATGAT GACAAGGTCAGCAAGCT 210 

Qy 189 acatacttgttag 201 

I I I I III I II 
Db 211 CCATAATTGGTAG 223 



RESULT 5 
BG326023 

LOCUS BG326023 1068 bp mRNA EST 27-FEB-2001 

DEFINITION 602424785F1 NIH_MGC_14 Homo sapiens cDNA clone IMAGE : 4 562781 5', 

mRNA sequence. 
ACCESSION BG326023 

VERSION BG326023.1 GI:13132460 

KEYWORDS EST. 
SOURCE human. 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 1068) 

AUTHORS NIH-MGC http://mgc.nci.nih.gov/. 

TITLE National Institutes of Health, Mammalian Gene Collection (MGC) 

JOURNAL Unpublished (1999) 
COMMENT Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: DCTD/DTP 
cDNA Library Preparation: Ling Hong/Rubin Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http: //image . llnl . gov 
Plate: LLCM1275 row: i column: 22 
High quality sequence stop: 495. 
FEATURES Location/Qualifiers 
source 1. .1068 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="IMAGE: 4562781" 
/clone_lib-"NIH_MGC_14" 

/tissue_type="renal cell adenocarcinoma" 
/lab_host-"DH10B (phage-resistant ) " 

/note="0rgan: kidney; Vector: pOTB7 ; Site_l : Xhol; Site_2: 
EcoRI; cDNA made by oligo-dT priming. Directionally 
cloned into EcoRI/XhoI sites using the following 5' 
adaptor: GGCACGAG (G) . Size-selected >500bp for average 
insert size 1.8kb. Library constructed by Ling Hong in 
the laboratory of Gerald M. Rubin (University of 
California, Berkeley) using ZAP-cDNA synthesis kit 
(Stratagene) and Superscript II RT (Life Technologies)." 
BASE COUNT 352 a 253 c 297 g 166 t 



ORIGIN 



Query Match 10.0%; Score 44.6; DB 11; Length 1068; 

Best Local Similarity 53.8%; Pred. No. 0.083; 

Matches 92; Conservative 0; Mismatches 79; Indels 0; Gaps 0; 

Qy 1 cgaaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaaga 60 

III I I I I I I I I I I I I I III I I I I II I I I I 
Db 697 CGGACTCCCAGATACTGGAAACAACGCGGACCAAGAAACAAAGGAAAAGAAGGCAAACAG 756 

Qy 61 ataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaa 120 

I II I I II III I III I I I I I I I I I I I I I I I I I I I I I 

Db 7 57 ACCACCAAGTACACACGCAGGAAAGCAGCAAAGACCGTAGCAGCCAAAGCAAGCCACAGA 816 

Qy 121 aagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaa 171 

II I I I I I III III Ml I MM ill II I I 

Db 817 AACAGGACACAAGAAACCAGGCACAAACGAGAAAAAAAAAAGGAGAACACA 8 67 



RESULT 6 
CNS015XR/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 



CNS015XR 1159 bp DNA GSS 26-JUL-1999 

Drosophila melanogaster genome survey sequence T7 end of BAC 
BACN15017 of DrosBAC library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL106041 

AL106041.1 GI:5619746 
GSS. 

fruit fly. 

Plasmid Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae; Drosophila. 

1 (bases 1 to 1159) 

Genoscope . 

Direct Submission 

Submitted ( 23- JUL-1 999 ) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of a 
collaboration with the European Drosophila Genome Project (EDGP) - 
http://www.edgp.ebi.ac.uk This Drosophila melanogaster BAC 
library (Dros BAC) was made by Alain Billaud at CEPH (Centre 
d 1 Etude du Polymorphisme Humain) with funding provided by a MRC 
project grant. The DNA was prepared from embryos by Alain Bucheton 
and Genevieve Payan. It has been constructed in the vector 
pBeloBACll . 

Location/Qualifiers 

1. .1159 

/ organism^" Drosophila melanogaster" 
/plasmid="pBeloBACll" 
/db_xref="taxon:7227" 
/clone_lib=" DrosBAC" 
/clone="BACN15017" 
/note-"end : T7" 
448 a 36 c 7 g 178 t 490 others 



ORIGIN 



Query Match 9.2%; Score 41; DB 13; Length 1159; 

Best Local Similarity 16.3%; Pred. No. 0.82; 

Matches 53; Conservative 145; Mismatches 127; Indels 0; Gaps 0; 

Qy 40 aaattcaacaacctccaaagaataatccgggtgccttccaagaatcctccaaccaccctt 99 

II : : I : : I : : : I I : : I ::::::::: : : : : : : | : 
Db 1159 ASASASVAVVAVASASVAAVVSAAVSSVSSSASASASSSSSSSSSASSSSSMAAAAGVVS 1100 

Qy 100 ggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaat 159 

: : : : I I : : : ::::::: | : : : : : | : : | : | : | | : : | : : I : M : I 
Db 1099 ASARSASAAVSVSAVSVSVVVASVSAVSVSVAVVASSSASAAAARSAVAVAVAVAAAVAA 1040 

Qy 160 gacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacgacgggcc 219 

: I : I | : | |: | :::::: | ::::: | : : : | : | : : i : 

Db 1039 VAMAAVMASASASASAVSVSSAVVSVASMASMASASVSASASCASAVAMSVVVSVSSSAS 980 

Qy 220 atgctatctagattccggtcttaatgagtacgtctgcagaaagactaataagtgctataa 279 

: : : : : | : : | : : : : : : I : : : I II:: I I 

Db 97 9 VSSSSSVSSSSASCRMSCAASAASAASVCGMSASMSMSAGASSVVSASAAASASASAASA 920 

Qy 280 gagcttggtgctctgcgtggcgagttgtcaaccatcatcatgaattcaagatactgcgga 339 

: I : ::::::: : | : : : : : : : : | : : : | :::::: : : | 
Db 919 SAASASASASVASASASVSMASASMVVHASVVVVVSAVSVVSSASAVSMRVARVAAGVSA 8 60 

Qy 340 gacatcatgatactgcggagacaga 364 

: i : I : I : : I : I : : : : I : I : I : : 
Db 859 S AS AAMABASWAVT VVS SAS AS ASM 835 



RESULT 7 

AW957338 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



AW957338 674 bp mRNA EST 01-JUN-2000 

EST369528 MAGE resequences, MAGE Homo sapiens cDNA, mRNA sequence. 
AW957338 

AW957338.1 GI:8147141 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 674) 

Hegde,P., Qi,R., Abernathy, K. , Dharap,S., Gaspard,R., Gay,C, Holt 
,I.E., Saeed,A.I., Sharov,V., Lee,N.H., Yeatman,T.J. and 
Quackenbush, J. 

Assessment of gene expression patterns in a model of colon tumor 

metastasis using a 19,200 element cDNA microarray 

Unpublished (2000) 

Contact: John Quackenbush 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850, USA 

Tel: 301 838 3528 

Fax: 301 838 0208 

Email : johnq@tigr . org 

Plate: 106 



Seq primer: Reverse. 



FEATURES 

source 



BASE COUNT 
ORIGIN 



233 a 



Location/Qualifiers 
1. .674 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone_lib="MAGE resequences, MAGE" 
/note="Vector : pBluescriptSKm" 
133 c 158 g 150 t 



Query Match 9.1%; Score 40.6; DB 10; Length 674; 

Best Local Similarity 54.3%; Pred. No. 0.93; 

Matches 82; Conservative 0; Mismatches 69; Indels 0; Gaps 0 

Qy 93 cacccttggtgcccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgcta 152 

MINIM I I I I I I I I II I Mill II II M I I II 

Db 4 88 CAGCCTTGGAAGCGAGGCAAAAAGCAAAAGAAGTGCAGAAGAAGCTGGTGCATAATGCTC 54 7 

Qy 153 ggaaaatgacgatgacaaagacgagggcatcgggcaacatacttgttagccgtaatgacg 212 

I II I I I I I I I I I I II II I I I I I II I II I M I I 
Db 54 8 TGGCAAATTTGGAGTCTATGGGTAAAACATCAGGGAAGCTGTTTGATAGCAGTGATGATG 607 

Qy 213 acgggccatgctatctagattccggtcttaa 243 

Ml I II MM III III 
Db 608 AC G AAT C T GAT T C T T AAG AT G AC AGT AAT AA 638 



RESULT 8 
CNS00L6W/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



CNS00L6W 1101 bp DNA GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC: 
BACR24H20 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL068145 

AL068145.1 GI:4958073 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila . 

1 (bases 1 to 1101) 

Genoscope . 

Direct Submission 

Submitted ( 02- JUN-1999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 



FEATURES 

source 



BASE COUNT 
ORIGIN 



EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location /Qualifiers 

1. .1101 

/organism^" Drosophila melanogaster" 
/db_xref="taxon:7227" 
/clone_lib="RPCI-98" 
/clone="BACR24H20" 
/note="end : TET3" 
288 a 223 c 181 g 313 t 96 others 



Query Match 8.9%; Score 39.4; DB 13; Length 1101; 

Best Local Similarity 32.2%; Pred. No. 2.2; 

Matches 48; Conservative 47; Mismatches 54; Indels 0; Gaps 0; 

Qy 15 cccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaataatccgggtgcc 74 

::::::::::: I : I : : I : I : : : I ::::::: : I I I I I I I I I I I 
Db 1083 MVMVVVGSVVSVGVGMAVSGCGVGMRCRCMCMCMMRMVMMMCCAAAAAAACCCGCGGGAC 102 4 

Qy 75 ttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaaacca 134 

I : : I I I : I I Mill I I I I I : : : : : : : : : : I : : I : : 

Db 1023 GGCMCMGCAGCAKCCCCCAACCCSCAAAGCCCAMCMRRVARRRRRRRGGRRRGGRGRGGG 964 

Qy 135 aagaggacagcaatgctaggaaaatgacg 163 

: I : I I : I I I I : I I I I I I I 

Db 963 RAARGAAGGSCAAGMAGCAGMAAACGACG 935 



RESULT 9 

AQ210844 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
COMMENT 



AQ210844 771 bp DNA GSS 18-SEP-1998 

HS_2230_A1_E06_MR CIT Approved Human Genomic Sperm Library D Homo 
sapiens genomic clone Plate=2230 Col=ll Row=I, DNA sequence. 
AQ210844 

AQ210844.1 GI:3619813 

GSS. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 771) 

Mahairas , G . G . , Wallace, J. C . , Smith, K., Swart zell , S . , Holzman,T., 
Keller, A., Shaker, R . , Furlong, J., Young, J., Zhao,S., Adams, M.D. and 
Hood, L. 

Sequence-tagged connectors: A sequence approach to mapping and 
scanning the human genome 

Proc. Natl, Acad. Sci. U. S. A. 96 (17), 9739-9744 (1999) 
99380589 

Contact: Mahairas GG, Wallace JC, Hood L 
High Throughput Sequencing Center 
University of Washington 



FEATURES 

source 



BASE COUNT 
ORIGIN 



401 Queen Anne Avenue North, Seattle, WA 98109, USA 

Tel: (206) 616-3618 

Fax: (206) 616-3887 

Email : jwallace@u . Washington . edu 

Sequence Tagged Connector 

Plate: 2230 row: I column: 11 

Class: BAC ends 

High quality sequence stop: 771. 
Location/Qualifiers 
1. .771 

/organism="Homo sapiens" 
/db_xref="taxon: 9606" 
/clone="Plate=2230 Col=ll Row=I" 

/clone_lib="CIT Approved Human Genomic Sperm Library D" 
/sex="male" 

/note="Organ : sperm; Vector: pBeloBACll; BAC Clones in 
E-Coli DH10B" 
460 a 237 c 25 g 38 t 11 others 



Query Match 8.7%; Score 38.6; DB 13; Length 771; 

Best Local Similarity 50.8%; Pred. No. 3.4; 

Matches 92; Conservative 0; Mismatches 89; Indels 0; Gaps 

Qy 14 acccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaataatccgggtgc 73 

I I I I I I I M I I I II I I I I I I I I I I I I I III I II I 
Db 585 ACACAACACAACACACAACCAAAACCCAAATAAACAAACCACCAAACAACATAAACACCC 64 4 

Qy 74 cttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaaacc 133 

III I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 64 5 AAACCACAAAACCAACAACCAACTCAAAACCCCAAATACACCCAAACAACAAAAAAAAAT 704 

Qy 134 aaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgggcaacata 193 

I I I I III II I I III I I I I I I III I I I I I 

Db 705 AAACAATACAAAAAAAACAACACCCACACCAACAAACACACAAAAACAACATACACCACA 7 64 

Qy 194 c 194 
I 

Db 765 C 765 



RESULT 10 

BG034748 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



BG034748 1642 bp mRNA EST 24-JAN-2001 

602301702F1 NIH_MGC_87 Homo sapiens cDNA clone IMAGE : 4 403256 5', 
mRNA sequence . 
BG034748 

BG034748.1 GI:12428371 

EST. 

human . 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 1642) 

NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 



JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Unpublished (1999) 
Contact: Robert Strausberg, Ph.D. 
Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: DCTD/DTP 

cDNA Library Preparation: Life Technologies, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 
found through the I.M.A.G.E. Consortium/LLNL at: 
http : //image . llnl . gov 
Plate: LLAM10113 row: c column: 01 
High quality sequence start: 18 
High quality sequence stop: 373. 

Location/Qualifiers 

1. .1642 

/organism="Homo sapiens" 
/db_xref^"taxon: 9606" 
/clone="IMAGE: 4403256" 
/ c 1 one_l i b= " N I H_MGC_8 7 " 

/tissue_type="mammary adenocarcinoma, cell line" 
/lab_host="DH10B (phage-resistant ) " 

/note="0rgan: breast; Vector: pCMV-SP0RT6; Site_l: NotI 
Site_2: Sail; Cloned unidirectionally ; oligo-dT primed. 
Average insert size 1.383 kb. Library enriched for 
full-length clones and constructed by Life Technologies 
Note: this is a NIH_MGC Library." 
534 a 471 c 389 g 247 t 1 others 



Query Match 8.6%; Score 38.2; DB 11; Length 1642; 

Best Local Similarity 53.7%; Pred. No. 5.1; 

Matches 79; Conservative 0; Mismatches 68; Indels 0; 



Gaps . 



Qy 4 5 caacaacctccaaagaataatccgggtgccttccaagaatcctccaaccacccttggtgc 104 

I I I I I I I I I I I I 111 I I I I I I I I II III I 
Db 1127 CGACGACCAACGAACAACACGCAGGAAGCACCACAACAACCAGACACGCACACACAAACA 118 6 



Qy 105 ccaagcaagccacaaaaagtgggcaaaccaaagaggacagcaatgctaggaaaatgacga 164 

I I I I I I I I I J II I IT II I I I I I II I I I I III 

Db 1187 CAAAGCAACAAACACGAACACGACAGACGCAAGAAAAACACGAACGACAGAAGAACAGGA 124 6 

Qy 165 tgacaaagacgagggcatcgggcaaca 191 

II I I I I I I I I I I I I Mill 
Db 124 7 GAACAGAGACGAGAGCACCAAGCGACA 1273 



RESULT 11 

BF671369 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



BF671369 697 bp mRNA EST 21-DEC-2000 

602151249F1 NIH_MGC_81 Homo sapiens cDNA clone IMAGE : 4 29224 5 5' 
mRNA sequence. 
BF671369 

BF671369.1 GI:11945264 

EST. 

human . 

Homo sapiens 



Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 697) 

AUTHORS NIH-MGC http://mgc.nci.nih.gov/. 

TITLE National Institutes of Health, Mammalian Gene Collection (MGC) 

JOURNAL Unpublished (1999) 
COMMENT Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: CLONETECH Laboratories, Inc. 
cDNA Library Preparation: CLONETECH Laboratories, Inc. 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Incyte Genomics, Inc. 

Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : //image . llnl . gov 

Plate: LLCM1138 row: a column: 14 

High quality sequence stop: 456. 
FEATURES Location /Qualifiers 

source 1 . . 697 

/organism="Homo sapiens" 

/db_xref="taxon: 9606" 

/clone="IMAGE: 4292245" 

/clone__lib="NIH_MGC_81" 

/lab_host="DH10B (Tl phage-resistant ) " 

/note="Organ: muscle (skeletal); Vector: pDNR-LIB 

(Clontech) ; Site_l: Sfil (ggccgcctcggcc) ; Site_2: Sfil 

(ggccattatggcc) ; 5 1 and 3 1 adaptors were used in cloning 

as follows: 5* adaptor sequence: 5 1 -CACGGCCATTATGGCC-3 1 

and 3 1 adaptor sequence: 

5' -ATTCTAGAGGCCGAGGCGGCCGACATG-dT (30) BN-3 1 (where B = A, 
C, or G and N = A, C, G, or T) . Average insert size 
1.55 kb (range 1.0-4.0 kb) . 15/15 colonies contained 
inserts by PCR. This library was enriched for full-length 
clones and was constructed by Clontech Laboratories (Palo 
Alto, CA) . " 

BASE COUNT 264 a 159 c 144 g 130 t 

ORIGIN 



Query Match 8.6%; Score 38; DB 11; Length 697; 

Best Local Similarity 51.1%; Pred. No. 4.8; 

Matches 89; Conservative 0; Mismatches 85; Indels 0; Gaps 0; 

Qy 18 aaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaataatccgggtgccttc 77 

I I I I I I I I I I I I III II I I I I I I I 

Db 4 98 AACACAACCACACAAAAAAGAACAAAACTCAAATACTCGGCGGGCGCAACAGGGCCCCAG 557 

Qy 78 caagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaaaccaaag 137 

I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 558 AAAGAAGCCTCTAAAAACACTCGGTGGCCATCGACGACACAAAGAGTGAACAGACCCCGG 617 

Qy 138 aggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgggcaaca 191 

I I I I I I I I II I I I I I II I I I I I I I I 

Db 618 GTCAAAACAGGCTCCCAAAGAAGACCCGGCCAAGAACGCGGCCGGCCCACCACA 671 



RESULT 12 



CNS006ST 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



CNS006ST 937 bp DNA GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR14F16 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL065880 

AL065880.1 GI:4944848 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota ; Diptera; Brachycera; 

Muscomorpha; Ephydroidea ; Drosophilidae ; Drosophila . 

1 (bases 1 to 937) 

Genoscope . 

Direct Submission 

Submitted (02- JUN-1999) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of £ 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP f s 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .937 

/organism=" Drosophila melanogaster" 
/db_xref="taxon: 7227" 
/clone_lib-"RPCI-98" 
/clone="BACR14F16" 
/note-"end : TET3" 
211 a 78 c 29 g 289 t 330 others 



Query Match 8.5%; Score 37.8; DB 13; Length 937; 

Best Local Similarity 23.2%; Pred. No. 5.8; 

Matches 41; Conservative 69; Mismatches 67; Indels 0; Gaps 0; 

Qy 15 cccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaataatccgggtgcc 74 

: : : I I I : : I : : I : : : : : I I : : I I : : I : I I I : I II : 
Db 594 MMMCAAAMMAMMAMAAMAAMMASAMAAMAAAMACMMCAMMCACAAMAMAAAMAMMMAAAA 653 

Qy 7 5 ttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaagtgggcaaacca 134 

: I : : : : I I I : : : : : : : II I : I : I : : : : : : : I : I : : 

Db 654 ACMAGAAMMAMAAAAMAACAMMMMMMCAAAVMVMGCACVCMAARMMVMAMMVCMARAMRV 713 



Qy 135 aagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcatcgggcaaca 191 

: : : : I : : I : : : : : I I : I : I : I : I I : I : : : I : : I I : : : : : I 
Db 714 GMRMMRARCRMARASRMRVVAAMAMAMCMRAAASAGASASRRGRRGAACVVRGVGSA 770 



RESULT 13 
CNS0073W/C 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



CNS0073W 922 bp DNA GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR14D09 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL066784 

AL066784.1 GI:4945247 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; . 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila. 

1 (bases 1 to 922) 

Genoscope-. 

Direct Submission 

Submitted { 02- JUN-1 999) Genoscope - Centre National de Sequencage 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 
filters for hybridization from the BACPAC Resource Center can be 
found at http://bacpac.med.buffalo.edu/drosophila_bac.htm. 

Location/Qualifiers 

1. .922 

/organ ism=" Drosophila melanogaster" 
/db_xref="taxon:7227" 
/clone_lib-"RPCI-98" 
/clone="BACR14D09" 
/note-"end : TET3" 
223 a 95 c 109 g 221 t 274 others 



Query Match 8.5%; Score 37.6; DB 13; Length 922; 

Best Local Similarity 21.6%; Pred. No. 6.6; 

Matches 37; Conservative 67; Mismatches- 67; Indels 0; Gaps 0 

Qy 1 cgaaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaaga 60 

I : : M : : :::::: | | :: : : : I I : : : : : : : : : I : : I : 1 I : I : : I 



Db 



8 60 CAMMAAMNMMMACMMMMCMMACMMAMCCMMACMMMAMAMMMMMMMMAMMAMCACMAMMMA 801 



Qy 61 ataatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaa 120 

: : : : : : : : : I I I I : : : I : : I : : : : I : I : : I I : 

Db 800 CACMCAMMMMCMMMMMMMMCMMCMMCMCCACMMMACACMAMCCMMCMMCMMACMCMMAAM 741 

Qy 121 aagtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaa 171 

II I | : : : | | : : I : I : : I : I : : I : : I : I : I 

Db 74 0 AAMMMMACAMMAMAAMMMMMAMAAMMAAMMAAMMMMAMMCCMCCMAAMAMA 690 



RESULT 14 
CNS04GFJ/ c 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



BASE COUNT 



CNS04GFJ 1128 bp DNA GSS 21-MAY-2000 

Tetraodon nigroviridis genome survey sequence PUC-Ori end of clone 
108K18 of library G from Tetraodon nigroviridis, genomic survey 
sequence . 
AL289576 

AL289576.1 GI:8028153 
GSS; genome survey sequence. 
Tetraodon nigroviridis. 
Tetraodon nigroviridis 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Actinopterygii ; Neopterygii; Teleostei; Euteleostei; Neoteleostei ; 
Acanthomorpha; Acanthopterygii ; Percomorpha; Tetraodonti formes ; 
Tetraodontidae; Tetraodon. 

1 (bases 1 to 1128) 

Roest-Crollius, H . , Jaillon,0., Dasilva,C, Fizames,C, Fisher, C, 
Bouneau,L., Billault,A., Quetier,F,, Saurin,W., Bernot,A. and 
Weissenbach, J . 

Charaterization and repeat analysis of the compact genome of the 

freshwater pufferfish Tetraodon nigroviridis 

Unpublished 

2 (bases 1 to 1128) 

Roest-Crollius, H . , Jaillon,0., Dasilva,C, Bouneau,L., Fisher, C, 
Bernot,A., Fizames,C, Wincker,P., Brottier,P., Quetier,F., 
Saurin,W. and Weissenbach, J . 

Human gene number estimate provided by genome wide analysis using 

Tetraodon nigroviridis DNA sequence 

Unpublished 

3 (bases 1 to 1128) 
Genoscope . 

Direct Submission 

Submitted ( 12-APR-2000 ) to the EMBL/GenBank/DDB J databases 

This sequence is a single read and was generated as part of a large 

scale clone-end sequencing project of the Tetraodon nigroviridis 

genome. For more information, please take a look at 

http : / /www . genoscope . ens . fr /Tetraodon . 

Location/Qualifiers 

1. .1128 

/organism=" Tetraodon nigroviridis" 
/db_xref="taxon: 99883" 
/clone="108K18" 
/clone_lib-"G" 

/note="Genoscope sequence ID : C0BG108BF09SPl~end : 
PUC-Ori" 

137 a 193 c 228 g 400 t 170 others 



ORIGIN 



Query Match 8.5%; Score 37.6; DB 13; Length' 1128; 

Best Local Similarity 33.3%; Pred. No. 6.9; 

Matches 63; Conservative 44; Mismatches 82; Indels 0; Gaps 0; 

Qy 3 aaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaat 62 

I : i I : I : I I I : : I I I : II I : I II: : : I : : I : : I : I : I : 
Db 1002 ARAAVASGGGVGGCGRGGMCAAASACGAAAAARCGAAAAMMMMMAMMAMMMCAMAMMAMM 94 3 

Qy 63 aatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaa 122 

: : : : : I I : : : I I I I I II: I I I I I : I I : I I I : I I 
Db 94 2 MMMAMRRCGCGCCMCSVGAVAACCMAAAAAAACASGCGMCCCACAAAMAAASAAAAAMAA 883 

Qy 123 gtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagacgagggcat 182 

I : : I I : I I : I I : I : : : : I : I III : I I I : I : 
Db 882 AGCNCAAVMACAVAAARAAAAAVACMVMMMSCAMCACAACMAAAMACACMACSACRMACA 823 

Qy 183 cgggcaaca 191 

i II:: 
Db 822 CGAACCAMV 814 



RESULT 15 

CNS0075A 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



CNS0075A 861 bp DNA GSS 03-JUN-1999 

Drosophila melanogaster genome survey sequence TET3 end of BAC # 
BACR14D11 of RPCI-98 library from Drosophila melanogaster (fruit 
fly), genomic survey sequence. 
AL066834 

AL066834 . 1 GI: 4945297 
GSS. 

fruit fly. 

Drosophila melanogaster 

Eukaryota; Metazoa; Arthropoda; Tracheata; Hexapoda; Insecta; 

Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; 

Muscomorpha; Ephydroidea; Drosophilidae ; Drosophila. 

1 (bases 1 to 861) 

Genoscope . 

Direct Submission 

Submitted (02-JUN-1999) Genoscope - Centre National de Sequencage : 
BP 191 91006 EVRY cedex - FRANCE (E-mail : seqref@genoscope.cns.fr 
- Web : www.genoscope.cns.fr) 

Determination of this BAC-end sequence was carried out as part of a 
collaboration with the Berkeley Drosophila Genome Project (BDGP) . 
The BDGP is constructing a physical map of the Drosophila 
melanogaster genome using these BACs . For further information 
please see http://www.fruitfly.org The BDGP Drosophila 
melanogaster BAC library was prepared by Kazutoyo Osoegawa and 
Aaron Mammoser in Pieter de Jong's laboratory in the Department of 
Cancer Genetics at the Roswell Park Cancer Institute in Buffalo, 
NY. The library is named RPCI-98 and was constructed by partial 
EcoRI digestion of Drosophila DNA provided by the BDGP from the 
isogenic strain y2; cn bw sp, the same strain used for the BDGP's 
PI and EST libraries. A more detailed description of the library 
and how to order individual BAC clones, the entire library, or 



FEATURES 

source 



BASE COUNT 
ORIGIN 



filters for hybridization from the BACPAC Resource Center can be 
found at http : //bacpac .med. buf f alo . edu/drosophila_bac . htm. 

Location/Qualifiers 

1. .861 

/organism="Drosophila melanogaster " 
/db_xref ="taxon : 7227 " 
/clone_lib="RPCI-98" 
/clone="BACR14Dll" 
/note="end : TET3" 
313 a 224 c 30 g 119 t 175 others 



Query Match 8.4%; Score 37.2; DB 13; Length 861; 

Best Local Similarity 33.7%; Pred. No. 8.3; 

Matches 58; Conservative 40; Mismatches 74; Indels 0; Gaps 

Qy 3 aaaacactggtacccaaaacaaccgtcaaccaagggcaaattcaacaacctccaaagaat 62 

:::!::: I I I I i : I : I I I i : I : I I I Ml I : : : I I : I : II 

Db 54 7 MMMAMMMCACCACCCACMAMAAACMMCMAMCAAAAAAAAAAAAAMMMACAMAMAMCAAAM 606 

Qy 63 aatccgggtgccttccaagaatcctccaaccacccttggtgcccaagcaagccacaaaaa 122 

I : : : : : : I I I : I I : I I I : I : I : I I I : : I I I I I I 

Db 607 AMAMMAMCMMMMMCMAAAMAMACCMAMAACAMCMCAAAAAAAAAMAMCAMMHAAAAAAAA 666 

Qy 123 gtgggcaaaccaaagaggacagcaatgctaggaaaatgacgatgacaaagac 174 

I I : : I : I : : I : : I h I I I : I I I : : : I 
Db 667 AAAAAACAAAMMAMAAMMMAMACMMAAAAAAAAMAAAMAMCAMCACCMMCMC 718 



Search completed: February 7, 2002, 08:20:45 
Job time: 18122 sec 



